dataframe - R data.frame get value from variable which is selected by another variable, vectorized -
i have data comes me many similar variables, additional variable indicates one of similar variables want. using loop can correct value, data large, loop slow, , seems should vectorizable. haven't figured out how.
edit: selected variable used new variable in same data frame, order matters. there many other variables not shown in example given below.
example data set:
set.seed(0) df <- data.frame(yr1 = sample(1000:1100, 8), yr2 = sample(2000:2100, 8), yr3 = sample(3000:3100, 8), yr4 = sample(4000:4100, 8), var = paste0("yr", sample(1:4, 8, replace = true))) # df # # yr1 yr2 yr3 yr4 var # 1 1090 2066 3050 4012 yr3 # 2 1026 2062 3071 4026 yr2 # 3 1036 2006 3098 4038 yr1 # 4 1056 2020 3037 4001 yr4 # 5 1088 2017 3075 4037 yr3 # 6 1019 2065 3089 4083 yr4 # 7 1085 2036 3020 4032 yr1 # 8 1096 2072 3061 4045 yr3 this loop method trick, slow , awkward:
ycode <- character(nrow(df)) for(i in 1:nrow(df)) { ycode[i] <- df[i, df$var[i]] } df$ycode <- ycode # df # yr1 yr2 yr3 yr4 var ycode # 1 1090 2066 3050 4012 yr3 3050 # 2 1026 2062 3071 4026 yr2 2062 # 3 1036 2006 3098 4038 yr1 1036 # 4 1056 2020 3037 4001 yr4 4001 # 5 1088 2017 3075 4037 yr3 3075 # 6 1019 2065 3089 4083 yr4 4083 # 7 1085 2036 3020 4032 yr1 1085 # 8 1096 2072 3061 4045 yr3 3061 it seems should able vectorize this, so:
df$ycode <- df[, df$var] but find result surprising:
# yr1 yr2 yr3 yr4 var ycode.yr3 ycode.yr2 ycode.yr1 ycode.yr4 ycode.yr3.1 ycode.yr4.1 ycode.yr1.1 ycode.yr3.2 # 1 1090 2066 3050 4012 yr3 3050 2066 1090 4012 3050 4012 1090 3050 # 2 1026 2062 3071 4026 yr2 3071 2062 1026 4026 3071 4026 1026 3071 # 3 1036 2006 3098 4038 yr1 3098 2006 1036 4038 3098 4038 1036 3098 # 4 1056 2020 3037 4001 yr4 3037 2020 1056 4001 3037 4001 1056 3037 # 5 1088 2017 3075 4037 yr3 3075 2017 1088 4037 3075 4037 1088 3075 # 6 1019 2065 3089 4083 yr4 3089 2065 1019 4083 3089 4083 1019 3089 # 7 1085 2036 3020 4032 yr1 3020 2036 1085 4032 3020 4032 1085 3020 # 8 1096 2072 3061 4045 yr3 3061 2072 1096 4045 3061 4045 1096 3061 i tried numerous variations on *apply, none of came close. attempts:
> apply(df, 1, function(x) x[x$var]) error in x$var : $ operator invalid atomic vectors > apply(df, 1, function(x) x[x[var]]) error in x[var] : invalid subscript type 'closure' any ideas? many thanks..
we can use row/column indexing. should fast compared loop.
df[-ncol(df)][cbind(1:nrow(df),match(df$var,head(names(df),-1)))] #[1] 3050 2062 1036 4001 3075 4083 1085 3061 just diversity, data.table solution (should slow compared indexing above). convert 'data.frame' 'data.table' (setdt(df)), grouped sequence of rows, get value of 'var' after converting character class.
library(data.table) setdt(df)[, ycode := get(as.character(var)) , 1:nrow(df)] df # yr1 yr2 yr3 yr4 var ycode #1: 1090 2066 3050 4012 yr3 3050 #2: 1026 2062 3071 4026 yr2 2062 #3: 1036 2006 3098 4038 yr1 1036 #4: 1056 2020 3037 4001 yr4 4001 #5: 1088 2017 3075 4037 yr3 3075 #6: 1019 2065 3089 4083 yr4 4083 #7: 1085 2036 3020 4032 yr1 1085 #8: 1096 2072 3061 4045 yr3 3061
Comments
Post a Comment