Day 2 of Viz with me

Data viz Beginner ggplot2 penguins DataViz Challenge

We create a canvas today to plot tomorrow.

Soundarya Soundararajan true
2024-10-02

Welcome to Day 2 of “Viz with Me” in R!

On day 0, we installed R and RStudio.
On day 1, we installed the necessary packages and loaded the penguins data.

Goals for today: Learning to create a canvas in R

Let’s get started!

Imagine you’ve collected some data, and I ask you to find the relationship between two variables, X and Y, in your dataset. What would you do? You might run a correlation or plot a simple scatterplot, with X on the x-axis and Y on the y-axis.

Like this

To do a similar plot in R, there are three components we need to break down:

  1. Data: We introduced you to the dataset yesterday. Do you remember its name? Yes, it’s the penguins dataset from the Palmer Penguins package.

  2. Defining X and Y: We need to inform R what the X and Y variables are. These are specified under the aes() function in ggplot.

Here’s a glimpse of how it will look:

ggplot(aes(x = , y = ))

We first use ggplot(), then the aes() function to introduce X and Y. This is the standard approach. We have not specified what is X and Y yet. We will do that in the next step.

If this code would speak it would say: “Hey R, take the ggplot package and use the aesthetics function to plot X and Y.”

  1. Drawing the scatterplot: This is done using geoms, which we’ll explore tomorrow.

For today, we’ll focus on the first two steps.

Step 1: Access the X and Y from the data

Let’s revisit the script from yesterday.

Run the following code to load the penguins dataset and to awaken the tidyverse package.:

But where is the data? We need to extract X and Y from the penguins dataset. Let’s assume x is bill length and y is body mass from the penguins dataset. We’re looking to find the relationship between them.

If you remember from yesterday, you’d access a variable like this:

penguins$bill_length_mm

That’s how you access the bill_length_mm variable, but it’s much simpler within ggplot. First, we call the data, like this:

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.1          18.7               181
 2 Adelie  Torgersen           39.5          17.4               186
 3 Adelie  Torgersen           40.3          18                 195
 4 Adelie  Torgersen           NA            NA                  NA
 5 Adelie  Torgersen           36.7          19.3               193
 6 Adelie  Torgersen           39.3          20.6               190
 7 Adelie  Torgersen           38.9          17.8               181
 8 Adelie  Torgersen           39.2          19.6               195
 9 Adelie  Torgersen           34.1          18.1               193
10 Adelie  Torgersen           42            20.2               190
# ℹ 334 more rows
# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>

Step 2: Informing ggplot

Next, we inform ggplot what the variables are. Here’s how:

penguins
ggplot(aes(x = bill_length_mm, y = body_mass_g))

What happens when you run this code?

Why Doesn’t It Plot Yet? You might notice that nothing happens yet. Do you know why? The data and the plot are not yet connected because we haven’t told them to “talk” to each other. Remember the pipe (%>%) we discussed yesterday? That’s where it comes in handy!

You know how to comment out the code, right? Use the # symbol before the code to comment it out. Whatever you learn now, if you want to make notes, do it as a comment in your script.

Step 3: Connecting the data and the plot

Let us use the pipe operator to connect the data and the plot. Here’s how:

penguins %>%
   ggplot(aes(x = bill_length_mm, y = body_mass_g))

What does this code say? It says, “Hey R, take the penguins dataset, and then take the ggplot package and use the aesthetics function and here are the X and Y variables.”

Now, run this – your canvas is ready!

If you don’t see the plot yet, don’t worry – we’ll “draw” the scatterplot tomorrow.

Until then, ggplot would eagerly wait for us to inform it about the plot we want to draw.

Don’t forget to save your script.

Happy holidays, and happy plotting!

Jump ahead to Day 3 to draw the scatterplot.

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 2). My R Space: Day 2 of Viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-02-day-2-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 2 of Viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-02-day-2-of-viz-with-me/},
  year = {2024}
}