Today we draw a line plot and learn to filter the data.
Welcome to Day 8 of “Viz with Me”!
Yesterday, we learnt how to use shapes to represent groups in a plot.
Goals for today: 1. Draw a line plot 2. Learn to filter the data for extracting a particular value in a variable
Today, we’re taking a slight turn and using the gapminder
dataset (Bryan 2023) to draw a line chart. This will also give us a chance to revisit the installation and loading of libraries, which we covered on Day 1.
install.packages("gapminder")
Typically, line charts are great for representing data over time (like year-wise data). When you explore the gapminder
dataset, you will notice it contains year-wise data along with several countries:
glimpse(gapminder)
Rows: 1,704
Columns: 6
$ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afgh…
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, As…
$ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 19…
$ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39…
$ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, …
View(gapminder)
For today, we’ll focus on India as an example, which means we need to filter the data before plotting.
# A tibble: 12 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 India Asia 1952 37.4 372000000 547.
2 India Asia 1957 40.2 409000000 590.
3 India Asia 1962 43.6 454000000 658.
4 India Asia 1967 47.2 506000000 701.
5 India Asia 1972 50.7 567000000 724.
6 India Asia 1977 54.2 634000000 813.
7 India Asia 1982 56.6 708000000 856.
8 India Asia 1987 58.6 788000000 977.
9 India Asia 1992 60.2 872000000 1164.
10 India Asia 1997 61.8 959000000 1459.
11 India Asia 2002 62.9 1034172547 1747.
12 India Asia 2007 64.7 1110396331 2452.
Did you notice the double equal sign (==
) in the filter
command? That is how it works in R. You use a single equal sign for assignment and a double equal sign for comparison. But we did use <-
for assignment in Day 1, right? That’s because R is flexible and allows you to use either <-
or=
for assignment. But for filtering, you must use ==
.
Let’s make the line chart now. We will plot GDP per capita over time for India.
gapminder %>%
filter(country == "India") %>%
ggplot(aes(x = year, y = gdpPercap)) +
geom_line() +
labs(title = "GDP per Capita Over Time in India",
caption = "Source: Gapminder",
x = "Year",
y = "GDP per Capita")
We start by taking the Gapminder dataset. We filter the data to include only rows where the country is India. This filter command comes from the dplyr package (Wickham et al. 2023), which is part of the tidyverse. Now you know why tidyverse is versatile and useful!
Finally, we create a line chart plotting GDP per capita over the years using ggplot.
As usual, you can customize the labels with labs(). If you’ve forgotten how, feel free to consult Day 4, where we covered this.
Filtering datasets – You have learned how to extract specific rows of interest.
Creating line charts with geom_line()
– We’ve applied the geom_line()
function to visualize trends over time.
That’s all for today! Tomorrow, we’ll explore bar charts. Until then, happy coding!
For attribution, please cite this work as
Soundararajan (2024, Oct. 8). My R Space: Day 8 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-08-day-8-of-viz-with-me/
BibTeX citation
@misc{soundararajan2024day, author = {Soundararajan, Soundarya}, title = {My R Space: Day 8 of viz with me}, url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-08-day-8-of-viz-with-me/}, year = {2024} }