dplyr - R: Split weighted column into equal-sized buckets -

April 15, 2011

i use dplyr's cut_number split column buckets approximately same number of observations, dataset in compact form each row has weight (number of observations).

example data frame:

df <- data.frame(     x=c(18,17,18.5,20,20.5,24,24.4,18.3,31,34,39,20,19,34,23),     weight=c(1,10,3,6,19,20,34,66,2,3,1,6,9,15,21) )

if there 1 observation of x per row, use df$bucket <- cut_number(df$x,3) segment x 3 buckets approximately same number of observations. how take account fact each row weighted number of observations? i'd avoid splitting each row weight rows since original dataframe has millions of rows.

based on comments, think may interval set seeking. apologies general un-r-ness of it:

dftest <- data.frame(x=1:6, weight=c(1,1,1,1,4,1))  f <- function(df, n) {   interval <- round(sum(df$weight) / n)   buckets <- vector(mode="integer", length(nrow(df)))   bucketnum <- 1   count <- 0   (i in 1:nrow(df)) {     count <- count + df$weight[i]     buckets[i] <- bucketnum     if (count >= interval) {       bucketnum <- bucketnum + 1       count <- 0     }   }   return(buckets) }

running function buckets items follows:

dftest$bucket <- f(dftest, 3)  #    x weight bucket #  1 1      1      1 #  2 2      1      1 #  3 3      1      1 #  4 4      1      2 #  5 5      4      2 #  6 6      1      3

for example:

df$bucket <- f(df, 3) #        x weight bucket #  1  18.0      1      1 #  2  17.0     10      1 #  3  18.5      3      1 #  4  20.0      6      1 #  5  20.5     19      1 #  6  24.0     20      1 #  7  24.4     34      1 #  8  18.3     66      2 #  9  31.0      2      2 #  10 34.0      3      2 #  11 39.0      1      2 #  12 20.0      6      3 #  13 19.0      9      3 #  14 34.0     15      3 #  15 23.0     21      3

Search This Blog

Color

dplyr - R: Split weighted column into equal-sized buckets -

Comments

Post a Comment

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

android - net_scheduler holding wakelock -

Unlimited choices in BASH case statement -