Day 8 of viz with me

Data viz Beginner dplyr ggplot lineplot filter DataViz Challenge

Today we draw a line plot and learn to filter the data.


Author

Affiliation

Soundarya Soundararajan

 

Published

Oct. 7, 2024

Citation

Soundararajan, 2024


Welcome to Day 8 of “Viz with Me”!

Yesterday, we learnt how to use shapes to represent groups in a plot.

Goals for today: 1. Draw a line plot 2. Learn to filter the data for extracting a particular value in a variable

Today, we’re taking a slight turn and using the gapminder dataset to draw a line chart. This will also give us a chance to revisit the installation and loading of libraries, which we covered on Day 1.

install.packages("gapminder")
library(gapminder)
library(tidyverse)

Typically, line charts are great for representing data over time (like year-wise data). When you explore the gapminder dataset, you will notice it contains year-wise data along with several countries:

glimpse(gapminder)
Rows: 1,704
Columns: 6
$ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afgh…
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, As…
$ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 19…
$ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39…
$ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, …
View(gapminder)

For today, we’ll focus on India as an example, which means we need to filter the data before plotting.

gapminder %>%
  filter(country == "India") # this is how you filter data in R
# A tibble: 12 × 6
   country continent  year lifeExp        pop gdpPercap
   <fct>   <fct>     <int>   <dbl>      <int>     <dbl>
 1 India   Asia       1952    37.4  372000000      547.
 2 India   Asia       1957    40.2  409000000      590.
 3 India   Asia       1962    43.6  454000000      658.
 4 India   Asia       1967    47.2  506000000      701.
 5 India   Asia       1972    50.7  567000000      724.
 6 India   Asia       1977    54.2  634000000      813.
 7 India   Asia       1982    56.6  708000000      856.
 8 India   Asia       1987    58.6  788000000      977.
 9 India   Asia       1992    60.2  872000000     1164.
10 India   Asia       1997    61.8  959000000     1459.
11 India   Asia       2002    62.9 1034172547     1747.
12 India   Asia       2007    64.7 1110396331     2452.

Did you notice the double equal sign (==) in the filter command? That is how it works in R. You use a single equal sign for assignment and a double equal sign for comparison. But we did use <- for assignment in Day 1, right? That’s because R is flexible and allows you to use either <- or=for assignment. But for filtering, you must use ==.

Let’s make the line chart now. We will plot GDP per capita over time for India.

gapminder %>%
  filter(country == "India") %>%
  ggplot(aes(x = year, y = gdpPercap)) +
  geom_line() +
  labs(title = "GDP per Capita Over Time in India",
       caption = "Source: Gapminder",
       x = "Year",
       y = "GDP per Capita")

How would this code speak?

How would this code speak?

We start by taking the Gapminder dataset. We filter the data to include only rows where the country is India. This filter command comes from the dplyr package , which is part of the tidyverse. Now you know why tidyverse is versatile and useful!

Finally, we create a line chart plotting GDP per capita over the years using ggplot.

As usual, you can customize the labels with labs(). If you’ve forgotten how, feel free to consult Day 4, where we covered this.

Key Takeaways

  1. Filtering datasets – You have learned how to extract specific rows of interest.

  2. Creating line charts with geom_line() – We’ve applied the geom_line() function to visualize trends over time.

That’s all for today! Tomorrow, we’ll explore bar charts. Until then, happy coding!

Line plot Photo Credit: Edgar Hernandez

Footnotes

    References

    Bryan, Jennifer. 2023. “Gapminder: Data from Gapminder.” https://CRAN.R-project.org/package=gapminder.
    Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.

    Citation

    For attribution, please cite this work as

    Soundararajan (2024, Oct. 8). My R Space: Day 8 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-08-day-8-of-viz-with-me/

    BibTeX citation

    @misc{soundararajan2024day,
      author = {Soundararajan, Soundarya},
      title = {My R Space: Day 8 of viz with me},
      url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-08-day-8-of-viz-with-me/},
      year = {2024}
    }