create count matrix in R -
this question has answer here:
i have large dataframe shown below few rows , column:
id1 id2 id3 id4 s1 2 4 2 6 s2 2 1 3 2 s3 2 2 2 2 s4 3 0 2 2
for each row need matrix count of each number in range of id value. since largest 6 in id values, creates matrix 7 columns i.e. 0 6 , fill count values.
sample output:
0 1 2 3 4 5 6 s1 0 0 2 0 1 0 1 s2 0 1 2 1 0 0 0 s3 0 0 4 0 0 0 0 s4 1 0 2 1 0 0 0
is there way of doing in r.
this perfect situation use apply
+ tabulate
, except inclusion of zeroes in data , need include them.
since need include tabulation of zeroes, make small modification tabulate
start @ 0 instead of 1.
here's function puts approach in place:
dftabulate <- function(indf) { nbins <- max(indf) `colnames<-`(t(apply(indf + 1, 1, tabulate, nbins = nbins + 1)), 0:nbins) }
here applied sample data.
dftabulate(mydf) # 0 1 2 3 4 5 6 # s1 0 0 2 0 1 0 1 # s2 0 1 2 1 0 0 0 # s3 0 0 4 0 0 0 0 # s4 1 0 2 1 0 0 0
you specify have "large" data.frame
don't describe how large, i'm not sure how relevant following benchmark is.
however, share logic behind using approach: tabulate
fast function, thought make use of efficiency.
here's benchmark:
set.seed(1) nrow = 10000 ncol = 100 min = 0 max = 500 mydf <- data.frame( matrix(sample(min:max, nrow*ncol, true), nrow = nrow, ncol = ncol, dimnames = list(paste0("s", 1:nrow), paste0("id", 1:ncol)))) fun2 <- function(df1 = mydf) { tbl <- table(c(row(df1)), factor(unlist(df1), levels=0:max)) dimnames(tbl)[[1]] <- row.names(df1) tbl } fun3 <- function(df1 = mydf) mtabulate(as.data.frame(t(df1))) system.time(dftabulate(mydf)) # user system elapsed # 0.000 0.000 0.154 system.time(fun2(mydf)) # user system elapsed # 0.000 0.000 1.018 system.time(fun3(mydf)) # user system elapsed # 4.560 0.000 3.081
Comments
Post a Comment