We kickstart our first day in the series with R script creation, installing key packages, and organizing the code.
Welcome to Day 1 of “Viz with Me” in R!
Goals for today:
If you are just arriving here for the first time, checkout my yesterday’s post to get a quick overview of what we are doing here.
Once you’ve installed R and RStudio, follow these steps to create a new script:
Go to File → New File → R Script, or simply press Ctrl + Shift + N. Save your script so you can return to it later.
A script is a file where you write your code, allowing you to run commands line by line or all at once. You can save your scripts for future use, making it a great tool for replicating tasks.
A script in simple terms:
Think of it as your coding notebook, storing all your code.
It’s helpful to organize your script using sections, especially for larger projects. To create a section, press Ctrl + Shift + R
, or manually add one by typing:
Sections are foldable (do you see the small dropdown mark at the beginning of the section, you can expand and collapse there) and keep your script neat and easy to navigate.
Let’s add an intention for this challenge as our first section:
Notice the #
symbol at the start of the line? In R, anything following #
is a comment—R won’t run it. Comments are useful for adding notes or explanations.
Script meaning upgraded: Imagine your script as notes for you in the form of comments and instructions for r to run as form of code. They can hold your thoughts, ideas, and even your intentions in addition to the code.
Before we dive into dataviz tomorrow, there are a few key things to familiarize yourself with:
To create visualizations, we need data. For simplicity, we will use built-in datasets, which saves us from the hassle of data import at this stage. Tomorrow’s examples will use the penguins
dataset from the palmerpenguins
package. If you haven’t yet, take a moment to familiarize yourself with this dataset here.
In R, there’s a useful operator called the “pipe,” which helps connecting your codes. Imagine the pipe operator as the pipe itself.
What does a pipe do? It connects. Similarly, the pipe operator connects your code from one line to another.
In my examples, I will be using the native pipe (|>), but if you’re accustomed to the older magrittr pipe (%>%), feel free to adapt the code accordingly. Whatever you pick, remember the keyboard shortcuts: Ctrl + Shift + M
. Dont type them out, just use the shortcut.
Functions are tools in R that perform tasks for you. For example, to calculate the mean of a variable in a dataset, instead of manually adding and dividing numbers, you can simply use:
mean(dataset$variable_name)
This command (or code whatever we call it) tells R to calculate the mean of a specific variable from your dataset. You will notice that the $
symbol is used to reference a variable within a dataset. If you are too curious and tried this in R, it might not work as we haven’t loaded the dataset yet. We will do that tomorrow.
A package in R is a collection of functions bundled together. palmerpenguins
is a package that contains the penguins
dataset, which we will use for our visualizations.
To use the palmerpenguins
dataset, you need to install the palmerpenguins
package, along with the tidyverse
package for visualization tools like ggplot2. Install these packages by running:
install.packages("palmerpenguins")
install.packages("tidyverse")
Make sure you’re connected to the internet during installation.
Where to do this? Remember the script we created? You can write these codes in the script and run the script. Write them, and then press Ctrl + Enter
to run the code while placing the cursor anywhere on the line. Or highlight the entire code and press Ctrl + Enter
to run the entire code.
Once installed, you can load the packages (some say calling the package) into your R session by using:
library(tidyverse) # for visualization tools
library(palmerpenguins) # for the penguins dataset
Someone asked me recently how do I know if they are installed. If R doesn’t recognize the packages while running this code above, it means they aren’t installed, so ensure the installation steps are completed successfully. Also did you notice me commenting out the codes? They are for your reference and they are ignored by R.
To view the dataset in the console, simply type:
penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <int>
1 Adelie Torgersen 39.1 18.7 181
2 Adelie Torgersen 39.5 17.4 186
3 Adelie Torgersen 40.3 18 195
4 Adelie Torgersen NA NA NA
5 Adelie Torgersen 36.7 19.3 193
6 Adelie Torgersen 39.3 20.6 190
7 Adelie Torgersen 38.9 17.8 181
8 Adelie Torgersen 39.2 19.6 195
9 Adelie Torgersen 34.1 18.1 193
10 Adelie Torgersen 42 20.2 190
# ℹ 334 more rows
# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>
For a clearer view, you can use:
View(penguins)
This will pop up a new tab with the dataset displayed in rows and columns.
To list all variable names, type:
names(penguins)
[1] "species" "island" "bill_length_mm"
[4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
[7] "sex" "year"
What you see is a list of variables in the dataset. These variables will be crucial for creating visualizations, so take a moment to familiarize yourself with them.
Ensure that R and RStudio are properly installed, and that you’ve successfully installed and loaded the necessary packages (tidyverse and palmerpenguins)(Horst, Hill, and Gorman 2020; Wickham et al. 2019)]. Your script should look something like this:
Note: If you are observant, you might notice the # before the installation code. Guess why? Tomorrow, when you run the script, R will ignore the installation code, as it’s commented out. This way, you won’t have to reinstall the packages every time you run the script. And we dont have to install the packages every time we run the script.
Make sure to save the script—you will be building on it as we progress. Tomorrow, we will start by creating a canvas and adding our first plots. Yey!!!
If you’re eager to explore more, consider looking into project creation in RStudio to dedicate a workspace for this 30-day challenge. Even if you prefer to stick with a single script, using it consistently will help you build and refine your code throughout the series.
Have you explored the variables in the penguins dataset? If not, take some time to do so today, as they will be crucial for tomorrow’s exercises.
And, say hi to the penguins for me!
That is all for today. I will see you tomorrow with the first visualization challenge.
Jump ahead to Day 2 of the series where you draw a plain canvas and prepare for the plots. Day 2 of Viz with Me
For attribution, please cite this work as
Soundararajan (2024, Oct. 1). My R Space: Day 1 of Viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-01-day-1-of-viz-with-me/
BibTeX citation
@misc{soundararajan2024day, author = {Soundararajan, Soundarya}, title = {My R Space: Day 1 of Viz with me}, url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-01-day-1-of-viz-with-me/}, year = {2024} }