Introduction

This is a demonstration of Excel to R equivalents. Specifically, I am showing equivalents of six key functions from the dplyr package.

Data comes from the US National Health and Nutrition Examination Survey and was collected from the NHANES package. It is a sample of 10,000 people, weighted to be representative of the US population.

I used the clean_names function from the janitor package to make the variable names easy to work with.

Get Data

Let’s import the NHANES data into a data frame and then take a look at it.

nhanes <- read_csv("nhanes.csv") 

nhanes

Get Data

I’ve also made a codebook in case you want to learn more about the variables (most are self-explanatory from their names).

nhanes_codebook <- read_csv("nhanes-codebook.csv") 

nhanes_codebook

select

nhanes %>% 
  select(height)

mutate

nhanes %>%
  mutate(height_inches = height / 2.54) %>% 
  select(height, height_inches)

filter

nhanes %>% 
  filter(height > 150) %>% 
  select(height)

summarize

nhanes %>% 
  summarize(mean_height = mean(height, na.rm = TRUE))

group_by

nhanes %>% 
  group_by(gender) %>% 
  summarize(mean_height = mean(height, na.rm = TRUE))

arrange

nhanes %>% 
  arrange(height) %>% 
  select(height)