Create New Column Based on Values in Another Column in R

Here is a short post on how to create a new column with the values from one column based on the values in separate column. There are a few situations where this might be useful. One is if you have data in long format with your surveys questions in one column and you want to separate the scores of one of the surveys into a separate column for time-series/longitudinal analyses. Another situation might be that you have several conditions and you want to create a column of scores retaining scores of a subset of the conditions.

I recently needed to do this to tidy up some data in long format and noticed that there are no obvious blog posts or stack overflow entries addressing this data tidying scenario. The solutions you will find come with a caveat: if you create a new vector based on values in another vector, you will likely generate a vector with a length that does not match that of the data frame, precluding you from adding it to the data frame.

I will provide an abstract example and then a real-life example.

Below we have a table with columns varA and varB. We want to make a new column, varC, with values corresponding to rows with ‘c’ in the varA column.

varAvarB
a12
b34
c23
a34
b23
c13

Let's first illustrate the problem. For example, you would think that df$newVar <- df$varB[(varA == "c")] would create a new column with values from varB that correspond with varA. The problem is that this approach, and many other approaches, will create a vector the length of the cases that fit the parameters you have for the new vector. In this case, since there are only two rows with “c”, the length of the new vector would be 2.

Okay, now let's work on the solution. We can use the mutate and ifelse functions from the tidyverse to accomplish our task:

# Load in the `tidyverse` package for the `dplyr` package tidying functions and piping.
library(tidyverse)

df <- df %>% 
	mutate(varC = ifelse(varA == "b", varB, NA) 

What does this code do?

  1. From the data frame, create varC using and ifelse statement and assign it to the original data frame.
  2. In the ifelse statement, we indicate that if values in varA equal “b”, then keep the corresponding value from column varB, else NA.

Here is the table you would get from this R code.

varAvarBvarC
a12NA
b3434
c23NA
a34NA
b2323
c13NA

Since these abstract examples can be a little tricky to apply to our own stuff, here's an example resembling a real data set, such as an experiential momentary assessment data set. Below is a table with data in long format. We want to make separate a column for negative affect subscale of the Positive and Negative Affect Schedule (PANAS-X) to conduct analyses solely on those scores. Although this can be accomplished with the data in wide format, you may want to keep it in long format depending on the analyses or data visualization you are interested in conducting.

idtime_indexpanas_x_subscalescore
0011panas_x_sad23
0012pans_x_pa11
0013panas_x_na6
0021panas_x_sad3
0022panas_x_pa2
0023panas_x_na4
0031panas_x_sad15
0032panas_x_pa10
0033panas_x_na8
df <- df %>%
	mutate(panas_x_na = ifelse(panas_x_subscale == "panas_x_na", score, NA)

Below is a data table amended with a vector containing values copied from the score column corresponding with “panas_x_na” in the panas_x_subscale column.

idtime_indexpanas_x_subscalescorepanas_x_na
0011panas_x_sad23NA
0012pans_x_pa11NA
0013panas_x_na66
0021panas_x_sad3NA
0022panas_x_pa2NA
0023panas_x_na44
0031panas_x_sad15NA
0032panas_x_pa10NA
0033panas_x_na88