r - change dataset to binary dataset -
my dataset :
df=data.frame(x=c(1,4,6,na,7,na,9,10,4,na), y=c(10,12,na,na,14,18,20,15,12,17), z=c(225,198,na,na,na,130,na,200,na,99)) df x y z 1 1 10 225 2 4 12 198 3 6 na na 4 na na na 5 7 14 na 6 na 18 130 7 9 20 na 8 10 15 200 9 4 12 na 10 na 17 99 i want change dataset binary dataset follows
observed elements=1
missed elements=0
x y z 1 1 1 1 2 1 1 1 3 1 0 0 4 0 0 0 5 1 1 0 6 0 1 1 7 1 1 0 8 1 1 1 9 1 1 0 10 0 1 1 how in r ? training code ifelse(df=na , 0 ,1) .
you can use !is.na, this:
# df[] <- as.numeric(!is.na(df)) # <- original answer df[] <- as.integer(!is.na(df)) # <- @docendodiscimus df # x y z # 1 1 1 1 # 2 1 1 1 # 3 1 0 0 # 4 0 0 0 # 5 1 1 0 # 6 0 1 1 # 7 1 1 0 # 8 1 1 1 # 9 1 1 0 # 10 0 1 1 if efficiency of concern, can try using "data.table" package:
as.data.table(df)[, lapply(.sd, function(x) as.numeric(!is.na(x)))] # x y z # 1: 1 1 1 # 2: 1 1 1 # 3: 1 0 0 # 4: 0 0 0 # 5: 1 1 0 # 6: 0 1 1 # 7: 1 1 0 # 8: 1 1 1 # 9: 1 1 0 # 10: 0 1 1 or assign while replacing:
as.data.table(df)[, (names(df)) := lapply(.sd, function(x) as.numeric(!is.na(x)))][] update
if interested in further benchmarks, can check out this gist.
summary of benchmarking:
- if it's sheer speed you're after, go "data.table" approach.
- if want efficient code in base r,
as.integer,+virtually neck-to-neck, think know recommendation lie.
Comments
Post a Comment