Section 2 Tips for effective R programming | Rad: R for academics (2024)

Before getting into actual R code, we’ll start with a few notesabout how to use it most effectively. Bad coding habits canmake your R code difficult to read and understand, so hopefullythese tips will ensure you have good habits right from the start.

2.1 Use RStudio’s “projects” feature

Every project you do in R should be set up in its own folder, throughRStudio’s “projects” feature. To start a fresh project, go toFile -> New project and create a new folder. Reopen the projectin RStudio whenever you want to work with it.

When sending your work to other people, you can send them the wholefolder and know that they’ll have access to all the required files andscripts.

One of the main benefits is that this sets R’s “working directory” tothe project folder by default. Any files you load or save canjust be referenced relative to that folder, so instead of:

data = haven::read_spss("R:/Project/2013/Analyses/Regressions/Data/Raw.sav")

You can just do:

data = haven::read_spss("Data/Raw.sav")

This will also work for anyone you send the project to, so you don’thave to worry that the file is in a different location on their machine.

2.2 Using scripts

It’s best to put every step of your data cleaning and analysisin a script that you save, rather than making temporary changesin the console.

Ideally, this will mean that you (or anyone else) can run the scriptfrom top to bottom, and get the same results every time, i.e.they’rereproducible.

2.2.1 Script layout

Most R scripts I write have the same basic layout:

Loading the libraries I’m using
Loading the data
Changing or analysing the data
Saving the results or the recoded data file

For a larger project, it’s good to create multiple different scripts foreach stage, e.g.one file to recode the data, one to run the analyses.

When saving the recoded data, it’s best to save it as a different file -you keep the raw data, and you can recreate the recoded dataexactly by rerunning your script.

2.2.2 A tip for better reproducibility

By default, RStudio will save your current “workspace” when youquit. While convenient, this can mean that you make one-off changesto your data and forget to save that command in your script. Startingwith a fresh session every time you open RStudio means you’ll learnto keep every step of your analysis in your script, and you’llknow that you can get back to where you were by rerunning the script.

To disable the default setting, go to Tools -> Global options..,and in the General tab, and:

Uncheck “Restore .RData”
Set “Save workspace on exit” to “Never”

RStudio’s “save workspace” settings

2.3 Long or wide data

R works better with long data, whereas SPSS generally worksbetter with wide data. Roughly speaking, long data means:

There is one “observation” of each participant/subject per row - e.g.onesurvey, one session
All the measurements of the same type are in one column - so all theK6 scores, from multiple surveys, would be in a single K6 column,with an additional Time or Survey column that identifies whichsurvey they come from.

Example long data:

ID	Survey	Drinking	AnxietyTotal
1	1	No	6
1	2	Yes	8
2	1	No	5
2	2	No	7

Example wide data:

ID	Drinking_T1	Drinking_T2	AnxietyTotal_T1	AnxietyTotal_T2
1	No	Yes	6	8
2	No	No	5	7

2.3.1 Going from wide to long

If your data is currently in wide format, you may have to reshapeit before working with it in R. The new pivot_longer and pivot_widerfunctions from the tidyr package are good for reshaping data like this.To go from the wide data above to a long dataset:

wide %>% tidyr::pivot_longer( cols = Drinking_T1:AnxietyTotal_T2, names_to = c(".value", "Time"), names_sep = "_" )

This can get more complicated if the columns haven’t beennamed consistently, or have multiple pieces of infostored in the name (e.g.t1_male_mean)

At the time of writing, the pivot functions were still in beta, but should be in the new tidyr version shortly.

2.4 Writing readable code

There are two very good reasons to try to write your code ina clear, understandable way:

2.4.1 Basic formatting tips

You can improve the readability of your code a lot by followinga few simple rules:

Put spaces between and around variable names and operators (=+-*/)
Break up long lines of code
Use meaningful variable names composed of 2 or 3 words (avoid abbreviationsunless they’re very common and you use them very consistently)

These rules can mean the difference between this:

lm1=lm(y~grp+grpTime,mydf,subset=sext1=="m")

and this:

male_difference = lm(DepressionScore ~ Group + GroupTimeInteraction, data = interview_data, subset = BaselineSex == "Male")

R will treat both pieces of code exactly the same, but for any humans reading,the nicer layout and meaningful names make it much easier to understandwhat’s happening, and spot any errors in syntax or intent.

2.4.2 Keeping a consistent style

Try to follow a consistent style for naming things, e.g.using snake_casefor all your variable names in your R code, and TitleCase for thecolumns in your data. Either style is probably better than lowercase withno spacing allmashedtogether.

It doesn’t particularly matter what that style is, as long as you’reconsistent. There is a suggested style guide for the tidyverse,but I don’t follow it 100%, I just try to be consistent within my code.

2.4.3 Writing comments

One of the best things you can do to make R code readable andunderstandable is write comments - R ignores lines that start with# so you can write whatever you want and it won’t affectthe way your code runs.

Comments that explain why something was done are great:

# Need to reverse code the score for question 3data$DepressionQ3 = 4 - data$DepressionQ3

Comments that explain what is being done are less useful. Peoplewho already understand R code should be able to tell what ishappening just by looking at your code (especially if you’re followingthe other advice about writing readable code), so these commentscan be redundant:

# Calculate the mean of the anxiety scoresanxiety_mean = mean(data$AnxietyTotal)

The exception to this is when you’ve run into something thatwas tricky to get working, and you need to explain it so otherpeople don’t run into the same issue:

# (Example only, do not run)# This fails to converge if we don't set the fix_missing optiondrinking_model = logit_regression(Drink ~ Group, fix_missing=TRUE)

2.5 Don’t panic: dealing with SPSS withdrawal

2.5.1 RStudio has a data viewer

As you get used to R, you should find that you get more comfortableusing the console to check on your data. You can often seea lot of the information you need by printing the first fewrows of a dataset to the console. The head() function printsthe first 6 rows of a table by default, and you can select the columns thatare most relevant to what you’re working on if there are too many:

head(iris[, c("Species", "Petal.Length")])

## Species Petal.Length## 1 setosa 1.4## 2 setosa 1.4## 3 setosa 1.3## 4 setosa 1.5## 5 setosa 1.4## 6 setosa 1.7

However, you can also use RStudio’s built-in data viewer to get a morefamiliar, spreadsheet style view of your data. In the Environmentpane in the top-right, you can click on the name of any data youhave loaded to bring up a spreadsheet view:

Data viewer example

This also supports basic sorting and filtering so you can explorethe data more easily (you’ll still need to write code using functionslike arrange() or filter() if you want to actually makechanges to the data though).

2.5.2 R can read SPSS files

The haven package can read (and write) SPSS data files, so youcan read in existing data:

survey_one = haven::read_spss("Data/Survey1_Final.sav")

R doesn’t deal with “value labels” in the same way as SPSS, andhaven tries to keep information about the SPSS value labels available.However, it’s best to just convert everything to R’s way of dealing withcategorical variables, i.e.factors, using haven’s as_factor() function:

survey_one = haven::as_factor(survey_one, levels = "both")

The levels = "both" option puts both the numeric value and the text labelinto the factor labels, like "[0] No", "[1] Yes". As you get morecomfortable with R you may want to just use levels = "labels" so youjust get the text labels like "No", "Yes".

You may need to convert your data from wide to long, sinceSPSS prefers wide and R prefers long.

The haven package can also read SAS and Stata data, and there arepackages like readxl for Excel files. It’s generally easy to readyour data into R from any format designed for tables of data.

2.6 Here be dragons: the bad parts of R

There are a few tools in R that tend to create more problemsthan they solve. Unfortunately beginners often end up usingthem (sometimes because bad tutorials recommend them). Mypersonal list of tools to avoid includes:

attach(): This copies all the individual variables from a datasetinto R’s environment, so you can access them with just var_nameinstead of dataset$var_name. The problem is:
- You end up with a lot of variables in your environment that arehard to keep track of.
- The variables can get out of sync with each other in waysthat wouldn’t be possible if they were kept in the dataset.
get() and assign(): These allow you to look up and create variablenames using strings, so instead of looking up model3, you canprogrammatically create the variable name likeget(paste0("model", model_num)).
- Again, you can end up with a lot of variables in your environment.
- People often use these when they want to run 100 different versionsof a model (there are sometimes good reasons to do this). Instead of creating100 different variables called model1, model2, …, model100, it’susually possible to save these in a single list or dataframe. Savingthem all in a list means it’s much easier to process and workwith the results.

Section 2 Tips for effective R programming | Rad: R for academics (2024)

FAQs

What is the most effective way to learn R? ›

One of the most effective ways to get started learning R is to start using it. RStudio. cloud Primers offer a cloud-based learning environment that will teach you the basics of R all from the comfort of your browser.

Know More ›

How do I make code more efficient in R? ›

Tips for speed

Use Vectorisation. A key first step is to embrace R's vectorisation capabilties. ...
Avoid creating objects in a loop. Example: Looping with data.frames. ...
Get a bigger computer. Run your code on a machine with bigger RAM and CPU. ...
Avoid expensive writes. ...
Find better packages. ...
Use parallel processing.

See Details ›

How to make R code more readable? ›

You can improve the readability of your code a lot by following a few simple rules:

Put spaces between and around variable names and operators ( =+-*/ )
Break up long lines of code.
Use meaningful variable names composed of 2 or 3 words (avoid abbreviations unless they're very common and you use them very consistently)

Discover More Details ›

How to effectively code in R? ›

Other ideas

Use a consistent style within your code. For example, name all matrices something ending in _mat . ...
Keep your code in bite-sized chunks. ...
Don't repeat yourself–automate! ...
Keep all of your source files for a project in the same directory, then use relative paths as necessary to access them.

Jun 7, 2024

Get More Info ›

Can I learn R in 2 days? ›

The learning curve for R programming is steep due to its unique syntax and extensive set of commands, requiring most new learners to spend four to six weeks mastering it.

Is R easier or harder than Python? ›

Is Python or R easier? Python is much more straightforward, using syntax closer to written English to execute commands. However, R makes it easier to visualize and manipulate data if you have other languages under your belt. It's statistics-based, so the syntax here is more straightforward for analysis.

Tell Me More ›

What is the weakness of R programming? ›

What Are the Disadvantages of R?

Lack of Performance: R is not the best option for high-performance computing tasks due to its interpreted nature. ...
In-efficient Memory Management: It is highly likely for users of R to encounter memory allocation errors and slow performance issues.

More items...

Nov 17, 2023

Know More ›

What is %% in R? ›

2.1.

%% ist der modulo operator. This finds the remainder after division, e.g. 5 %% 2 (5 modulo 2) is equal to 1. %/% is a whole number division, e.g. 5 %/% 2 is equal to 2 (how many times is 2 contained in 5?). These operators are often used for programming.

Get More Info ›

Why is R programming so powerful? ›

R interprets the code and makes the development of code easier. Many calculations done with vectors – R is a vector language, so anyone can add functions to a single Vector without putting in a loop. Hence, R is powerful and faster than other languages.

Keep Reading ›

How to write neat R code? ›

Some principles for writing nice code (in R)

Making code more readable. ...
Tell us what your function is doing, not how. ...
Keep your analysis script small by moving R functions into another file. ...
Break down problem into bite size pieces. ...
Know that your code is doing the right thing.

More items...

Know More ›

How to keep R code clean? ›

Clean and tidy R-script

Use systematically the same style of syntax. ...
Use meaningful and short names for created R objects. ...
Do not create many objects if you don't necessarily need them. ...
Sometimes to be more explicit is better than to be too concise. ...
Keep things simple.

Jan 11, 2020

View Details ›

How do I make a loop more efficient in R? ›

There are a number of steps that you can do to speed up a loop a bit more.

Calculate results before the loop.
initialize objects before the loop.
Iterate on as few numbers as possible.
Write as less functions inside a loop as possible.

Know More ›

What is the best way to learn R programming? ›

Learn R in 8 Steps

Should you learn R?
Study Essential R Terminology.
Understand how R is used.
Download R & Find Essential Resources.
Take R Courses with Pluralsight.
Commit to Best Practices for R.
Meet other developers & start some advanced tasks.
Get a job with R programming.

Feb 1, 2023

View Details ›

What is the best practice for comments in R? ›

Comments are completely ignored by R and are just there for whomever is reading the code. You can use comments to explain what a certain line of code is doing, or just to visually separate meaningful chunks of code from each other. Comments in R are designated by a # (pound) sign.

Know More ›

How to make R code look good? ›

Style

Hold yourself to the style guide (e.g. Style guide · Advanced R)
Do not put in too many blank lines, if you want to separate code chunks. One is usually enough, maybe two if a big new Chapter begins.
Create good headlines for different sections in your R-Code.

Feb 1, 2022

Show Me More ›

Is it possible to learn R by yourself? ›

A great way to learn proper programming practices is by reading books. An advantage of books is that they often represent an expert voice, the skill of the community, or both. Most good books for learning programming in R will contain code examples that you can use to sharpen your skills.