r - change dataset to binary dataset -


my dataset :

df=data.frame(x=c(1,4,6,na,7,na,9,10,4,na),           y=c(10,12,na,na,14,18,20,15,12,17),           z=c(225,198,na,na,na,130,na,200,na,99)) df     x  y   z 1   1 10 225 2   4 12 198 3   6 na  na 4  na na  na 5   7 14  na 6  na 18 130 7   9 20  na 8  10 15 200 9   4 12  na 10 na 17  99 

i want change dataset binary dataset follows

observed elements=1

missed elements=0

 x y z 1  1 1 1 2  1 1 1 3  1 0 0 4  0 0 0 5  1 1 0 6  0 1 1 7  1 1 0 8  1 1 1 9  1 1 0 10 0 1 1 

how in r ? training code ifelse(df=na , 0 ,1) .

you can use !is.na, this:

# df[] <- as.numeric(!is.na(df))  # <- original answer df[] <- as.integer(!is.na(df))    # <- @docendodiscimus df #    x y z # 1  1 1 1 # 2  1 1 1 # 3  1 0 0 # 4  0 0 0 # 5  1 1 0 # 6  0 1 1 # 7  1 1 0 # 8  1 1 1 # 9  1 1 0 # 10 0 1 1 

if efficiency of concern, can try using "data.table" package:

as.data.table(df)[, lapply(.sd, function(x) as.numeric(!is.na(x)))] #     x y z #  1: 1 1 1 #  2: 1 1 1 #  3: 1 0 0 #  4: 0 0 0 #  5: 1 1 0 #  6: 0 1 1 #  7: 1 1 0 #  8: 1 1 1 #  9: 1 1 0 # 10: 0 1 1 

or assign while replacing:

as.data.table(df)[, (names(df)) := lapply(.sd, function(x) as.numeric(!is.na(x)))][] 

update

if interested in further benchmarks, can check out this gist.

summary of benchmarking:

  • if it's sheer speed you're after, go "data.table" approach.
  • if want efficient code in base r, as.integer , + virtually neck-to-neck, think know recommendation lie.

Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -