Importing data into RStudio - a step-by-step approach

Data import Beginner

As a beginner in R, importing data is the first step to master before exploring the data viz and analyses. This blogpost takes a step-by-step approach to import your data into RStudio.

Soundarya Soundararajan true
08-01-2021

For data analysis in R, as a beginner, the first thing to master is how to import your data into R for analysis. I will go over a step-by-step process of how to import your data into R. If you want me to add information on importing other file types, please leave a comment below.

First things first.

There are three major ways to accomplish this.

        1. Import by clicking on the data
        
        2. Import by selecting File--> Import Dataset
        
        3. By command line  

No method is superior over the other, so pick your shot.

Prerequisites before import

  1. Have a separate project in R for any analysis related to a particular project. This helps you in the long run. More on this soon. For eg: I created a project called “ForBlog” for this blogpost and the rest follows.
  2. Add your data (in excel format here) into the project. You can just copy and paste your excel file into the relevant R project folder you created.
  3. There are ways to clean the data once you import, but for a beginner, I suggest having a clean excel sheet before importing to R. consult here for some best practices to prepare excel sheets before R import.

Now the 3 ways to import data

1. Import by clicking on the data

It is as simple as it sounds.

Data is available in the project folder for import
Single-left-click on the data reveals options
click Import Dataset

You get a nice preview of your data. “Import options” and “code below” are some details which we will avoid for now. Click Import at the right-hand corner below.

Voila!

The data is imported which you can see in the “Source” pane. Caution: Do not rely on the location in my image, as the 4 panes are customizable and movable. More here.

You can also see that your data is imported by looking at the “Environment” pane; reads 10 obs (observations) of 2 variables). Excellent, now instead of clicking import when we previewed the data, we have another option.

Let’s refer back to the import preview again. Just above the Import option you clicked previously, you see 3 lines of command.

You can copy and paste this, wait do not select and copy, click that small white button on the top right of this command lines while in preview mode. Command lines are copied to your clipboard now. Now cancel this and go to your “Console” pane to paste and run (command+enter).

I always open a script file and paste these lines, so that you have a copy of what you are doing.

Either select all and enter command+enter or run line by line in the same order.

These commands basically inform R to

  1. Open the library called “readxl” (because our file is excel)

  2. Second line informs how to read our excel file and what to name it as, unless you change while previewing, it will be imported in the same name as it was stored.

  3. The third command instructs R to open the data for you to “View” (note the capital V)

2. Import by selecting File–> Import Dataset

Click File and choose import dataset

Unlike previous, we see a null data here (no preview also command lines mention null data)

Don’t be scared, this is because we have not chosen our file.

Choose your data

You see that the preview pane is updated now as previous.

3. By command line

You will call in the library(readxl) to open the excel file. And use the read_excel function.

library(readxl)
data <- read_excel("data.xlsx") #note the direction of the arrowhead
data_new <- read_excel("data.xlsx", sheet = 2)  #You can specify sheets

Importing files other than Excel

Sometimes we might want to import files types other than excel, with an academic perspective I can imagine one wanting to import SPSS or Stata files.

library(haven)
# read_stata(file = "yourfilename") 
# read_spss(file = "filename")

I have used haven package to import other files with calling the corresponding function. Also I have introduced the commands after #(meaning they dont run and are considered as comments) because I do not have any spss/stata file to demonstrate. When you use this function you have to remove the hashtags and run the command line. To run a command, write or copy the command to console or R script, keep the cursor in the line and press cntrl+enter.

I personally use csv files, if you do too, here is the way to import them.

library(readr)
newdata <- read_csv("data2.csv")

In due course, you will find it much easier to write your command lines by yourself without going through any of these. But to begin with, this is essential.

Some questions to ponder

Please feel free to comment on what other file types you use and I will update this blog as needed.

Until then, happy importing!!

Citation

For attribution, please cite this work as

Soundararajan (2021, Aug. 1). My R Space: Importing data into RStudio - a step-by-step approach. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-05-13-how-to-import-your-data-into-r/

BibTeX citation

@misc{soundararajan2021importing,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Importing data into RStudio - a step-by-step approach},
  url = {https://github.com/soundarya24/SoundBlog/posts/2021-05-13-how-to-import-your-data-into-r/},
  year = {2021}
}