dataframe - R data.frame get value from variable which is selected by another variable, vectorized -


i have data comes me many similar variables, additional variable indicates one of similar variables want. using loop can correct value, data large, loop slow, , seems should vectorizable. haven't figured out how.

edit: selected variable used new variable in same data frame, order matters. there many other variables not shown in example given below.

example data set:

set.seed(0) df <- data.frame(yr1 = sample(1000:1100, 8),                  yr2 = sample(2000:2100, 8),                  yr3 = sample(3000:3100, 8),                  yr4 = sample(4000:4100, 8),                  var = paste0("yr", sample(1:4, 8, replace = true))) # df #  #    yr1  yr2  yr3  yr4 var # 1 1090 2066 3050 4012 yr3 # 2 1026 2062 3071 4026 yr2 # 3 1036 2006 3098 4038 yr1 # 4 1056 2020 3037 4001 yr4 # 5 1088 2017 3075 4037 yr3 # 6 1019 2065 3089 4083 yr4 # 7 1085 2036 3020 4032 yr1 # 8 1096 2072 3061 4045 yr3 

this loop method trick, slow , awkward:

ycode <- character(nrow(df)) for(i in 1:nrow(df)) {  ycode[i] <- df[i, df$var[i]] } df$ycode <- ycode  # df #    yr1  yr2  yr3  yr4 var ycode # 1 1090 2066 3050 4012 yr3  3050 # 2 1026 2062 3071 4026 yr2  2062 # 3 1036 2006 3098 4038 yr1  1036 # 4 1056 2020 3037 4001 yr4  4001 # 5 1088 2017 3075 4037 yr3  3075 # 6 1019 2065 3089 4083 yr4  4083 # 7 1085 2036 3020 4032 yr1  1085 # 8 1096 2072 3061 4045 yr3  3061  

it seems should able vectorize this, so:

df$ycode <- df[, df$var] 

but find result surprising:

#    yr1  yr2  yr3  yr4 var ycode.yr3 ycode.yr2 ycode.yr1 ycode.yr4 ycode.yr3.1 ycode.yr4.1 ycode.yr1.1 ycode.yr3.2 # 1 1090 2066 3050 4012 yr3      3050      2066      1090      4012        3050        4012        1090        3050 # 2 1026 2062 3071 4026 yr2      3071      2062      1026      4026        3071        4026        1026        3071 # 3 1036 2006 3098 4038 yr1      3098      2006      1036      4038        3098        4038        1036        3098 # 4 1056 2020 3037 4001 yr4      3037      2020      1056      4001        3037        4001        1056        3037 # 5 1088 2017 3075 4037 yr3      3075      2017      1088      4037        3075        4037        1088        3075 # 6 1019 2065 3089 4083 yr4      3089      2065      1019      4083        3089        4083        1019        3089 # 7 1085 2036 3020 4032 yr1      3020      2036      1085      4032        3020        4032        1085        3020 # 8 1096 2072 3061 4045 yr3      3061      2072      1096      4045        3061        4045        1096        3061 

i tried numerous variations on *apply, none of came close. attempts:

> apply(df, 1, function(x) x[x$var]) error in x$var : $ operator invalid atomic vectors > apply(df, 1, function(x) x[x[var]]) error in x[var] : invalid subscript type 'closure' 

any ideas? many thanks..

we can use row/column indexing. should fast compared loop.

 df[-ncol(df)][cbind(1:nrow(df),match(df$var,head(names(df),-1)))]  #[1] 3050 2062 1036 4001 3075 4083 1085 3061 

just diversity, data.table solution (should slow compared indexing above). convert 'data.frame' 'data.table' (setdt(df)), grouped sequence of rows, get value of 'var' after converting character class.

library(data.table) setdt(df)[, ycode := get(as.character(var)) , 1:nrow(df)] df #    yr1  yr2  yr3  yr4 var ycode #1: 1090 2066 3050 4012 yr3  3050 #2: 1026 2062 3071 4026 yr2  2062 #3: 1036 2006 3098 4038 yr1  1036 #4: 1056 2020 3037 4001 yr4  4001 #5: 1088 2017 3075 4037 yr3  3075 #6: 1019 2065 3089 4083 yr4  4083 #7: 1085 2036 3020 4032 yr1  1085 #8: 1096 2072 3061 4045 yr3  3061 

Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -