Day 1 of Viz with me

Data viz Beginner R scripts DataViz Challenge

We kickstart our first day in the series with R script creation, installing key packages, and organizing the code.

Soundarya Soundararajan true
2024-10-01

Welcome to Day 1 of “Viz with Me” in R!

Goals for today:

  1. How to create and work with scripts in R.
  2. Getting familiarized with the data we will be using

If you are just arriving here for the first time, checkout my yesterday’s post to get a quick overview of what we are doing here.

Getting Started with Scripts

Once you’ve installed R and RStudio, follow these steps to create a new script:

Go to File → New File → R Script, or simply press Ctrl + Shift + N. Save your script so you can return to it later.

Saving the scripts

What Is a Script?

A script is a file where you write your code, allowing you to run commands line by line or all at once. You can save your scripts for future use, making it a great tool for replicating tasks.

Photo by Pixabay

A script in simple terms:

Think of it as your coding notebook, storing all your code.

Creating Sections in Your Script

It’s helpful to organize your script using sections, especially for larger projects. To create a section, press Ctrl + Shift + R, or manually add one by typing:

Sections are foldable (do you see the small dropdown mark at the beginning of the section, you can expand and collapse there) and keep your script neat and easy to navigate.

Let’s add an intention for this challenge as our first section:

My intention for the next 30 days

Notice the # symbol at the start of the line? In R, anything following # is a comment—R won’t run it. Comments are useful for adding notes or explanations.

Script meaning upgraded: Imagine your script as notes for you in the form of comments and instructions for r to run as form of code. They can hold your thoughts, ideas, and even your intentions in addition to the code.

Preparing for Tomorrow’s Exercises

Before we dive into dataviz tomorrow, there are a few key things to familiarize yourself with:

Data for Plotting

To create visualizations, we need data. For simplicity, we will use built-in datasets, which saves us from the hassle of data import at this stage. Tomorrow’s examples will use the penguins dataset from the palmerpenguins package. If you haven’t yet, take a moment to familiarize yourself with this dataset here.

The Pipe Operator (%>% vs. |>)

In R, there’s a useful operator called the “pipe,” which helps connecting your codes. Imagine the pipe operator as the pipe itself.

What does a pipe do? It connects. Similarly, the pipe operator connects your code from one line to another.

In my examples, I will be using the native pipe (|>), but if you’re accustomed to the older magrittr pipe (%>%), feel free to adapt the code accordingly. Whatever you pick, remember the keyboard shortcuts: Ctrl + Shift + M. Dont type them out, just use the shortcut.

Functions and Packages

Functions are tools in R that perform tasks for you. For example, to calculate the mean of a variable in a dataset, instead of manually adding and dividing numbers, you can simply use:

mean(dataset$variable_name)

This command (or code whatever we call it) tells R to calculate the mean of a specific variable from your dataset. You will notice that the $ symbol is used to reference a variable within a dataset. If you are too curious and tried this in R, it might not work as we haven’t loaded the dataset yet. We will do that tomorrow.

A package in R is a collection of functions bundled together. palmerpenguins is a package that contains the penguins dataset, which we will use for our visualizations.

Installing and Using Packages

To use the palmerpenguins dataset, you need to install the palmerpenguins package, along with the tidyverse package for visualization tools like ggplot2. Install these packages by running:

install.packages("palmerpenguins")
install.packages("tidyverse")

Make sure you’re connected to the internet during installation.

Where to do this? Remember the script we created? You can write these codes in the script and run the script. Write them, and then press Ctrl + Enter to run the code while placing the cursor anywhere on the line. Or highlight the entire code and press Ctrl + Enter to run the entire code.

Once installed, you can load the packages (some say calling the package) into your R session by using:

library(tidyverse) # for visualization tools
library(palmerpenguins) # for the penguins dataset

Someone asked me recently how do I know if they are installed. If R doesn’t recognize the packages while running this code above, it means they aren’t installed, so ensure the installation steps are completed successfully. Also did you notice me commenting out the codes? They are for your reference and they are ignored by R.

Exploring the Penguins Dataset

To view the dataset in the console, simply type:

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.1          18.7               181
 2 Adelie  Torgersen           39.5          17.4               186
 3 Adelie  Torgersen           40.3          18                 195
 4 Adelie  Torgersen           NA            NA                  NA
 5 Adelie  Torgersen           36.7          19.3               193
 6 Adelie  Torgersen           39.3          20.6               190
 7 Adelie  Torgersen           38.9          17.8               181
 8 Adelie  Torgersen           39.2          19.6               195
 9 Adelie  Torgersen           34.1          18.1               193
10 Adelie  Torgersen           42            20.2               190
# ℹ 334 more rows
# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>

For a clearer view, you can use:

View(penguins)

This will pop up a new tab with the dataset displayed in rows and columns.

To list all variable names, type:

names(penguins)
[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"             

What you see is a list of variables in the dataset. These variables will be crucial for creating visualizations, so take a moment to familiarize yourself with them.

Wrapping Up for Today

Ensure that R and RStudio are properly installed, and that you’ve successfully installed and loaded the necessary packages (tidyverse and palmerpenguins)(Horst, Hill, and Gorman 2020; Wickham et al. 2019)]. Your script should look something like this:

Note: If you are observant, you might notice the # before the installation code. Guess why? Tomorrow, when you run the script, R will ignore the installation code, as it’s commented out. This way, you won’t have to reinstall the packages every time you run the script. And we dont have to install the packages every time we run the script.

Make sure to save the script—you will be building on it as we progress. Tomorrow, we will start by creating a canvas and adding our first plots. Yey!!!

If you’re eager to explore more, consider looking into project creation in RStudio to dedicate a workspace for this 30-day challenge. Even if you prefer to stick with a single script, using it consistently will help you build and refine your code throughout the series.

Have you explored the variables in the penguins dataset? If not, take some time to do so today, as they will be crucial for tomorrow’s exercises.

And, say hi to the penguins for me!

Artwork by allison_horst

That is all for today. I will see you tomorrow with the first visualization challenge.

Jump ahead to Day 2 of the series where you draw a plain canvas and prepare for the plots. Day 2 of Viz with Me

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. “Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.” https://doi.org/10.5281/zenodo.3960218.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse 4: 1686. https://doi.org/10.21105/joss.01686.

References

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 1). My R Space: Day 1 of Viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-01-day-1-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 1 of Viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-01-day-1-of-viz-with-me/},
  year = {2024}
}