dataframe - Summing values after every third position in data frame in R -
i new r. have data frame following
>df=data.frame(id=c("entry_1","entry_1","entry_1","entry_2","entry_2","entry_2","entry_3","entry_4","entry_4","entry_4","entry_4"),start=c(20,20,20,37,37,37,68,10,10,10,10),end=c(50,50,50,78,78,78,200,94,94,94,94),pos=c(14,34,21,50,18,70,101,35,2,56,67),hits=c(12,34,17,89,45,87,1,5,6,3,26)) id start end pos hits entry_1 20 50 14 12 entry_1 20 50 34 34 entry_1 20 50 21 17 entry_2 37 78 50 89 entry_2 37 78 18 45 entry_2 37 78 70 87 entry_3 68 200 101 1 entry_4 10 94 35 5 entry_4 10 94 2 6 entry_4 10 94 56 3 entry_4 10 94 67 26
for each entry iterate data.frame in 3 different modes. example, entry_1 mode_1 =seq(20,50,3)
and mode_2=seq(21,50,3)
, mode_3=seq(22,50,3)
. sum values in column "hits" corresponding values in column "pos" falls in mode_1 or_mode_2 or mode_3 , generate data.frame follow:
id mode_1 mode_2 mode_3 entry_1 0 17 34 entry_2 87 89 0 entry_3 1 0 0 entry_4 26 8 0
i tried following code:
mode_1=0 mode_2=0 mode_3=0 mode_1_sum=0 mode_2_sum=0 mode_3_sum=0 for(i in dim(df)[1]) { if(df$pos[i] %in% seq(df$start[i],df$end[i],3)) { mode_1_sum=mode_1_sum+df$hits[i] print(mode_1_sum) } mode_1=mode_1_sum+counts print(mode_1) ifelse(df$pos[i] %in% seq(df$start[i]+1,df$end[i],3)) { mode_2_sum=mode_2_sum+df$hits[i] print(mode_2_sum) } mode_2_sum=mode_2_sum+counts print(mode_2) ifelse(df$pos[i] %in% seq(df$start[i]+2,df$end[i],3)) { mode_3_sum=mode_3_sum+df$hits[i] print(mode_3_sum) } mode_3_sum=mode_3_sum+counts print(mode_3_sum) }
but above code prints 26. can 1 guide me how generate desired output, please. can provide more details if needed. in advance.
it's not elegant solution, works.
m <- 3 # number of modes want foo <- ((df$pos - df$start)%%m + 1) * (df$start < df$pos) * (df$end > df$pos) tab <- matrix(0,nrow(df),m) for(i in 1:m) tab[foo==i,i] <- df$hits[foo==i] aggregate(tab,list(df$id),fun=sum) # group.1 v1 v2 v3 # 1 entry_1 0 17 34 # 2 entry_2 87 89 0 # 3 entry_3 1 0 0 # 4 entry_4 26 8 0
-- explanation --
first, find indices of df$pos
both bigger df$start
, smaller df$end
. these should return 1
if true
, 0
if false
. next, take difference between df$pos
, df$start
, take mod 3 (which give vector of 0
s, 1
s , 2
s), , add 1 right mode. multiply these 2 things together, values fall within interval retain right mode, , values fall outside interval become 0
.
next, create empty matrix contain values. then, use for
-loop fill in matrix. finally, aggregate matrix.
i tried looking quicker solution, main problem cannot work around varying intervals each row.
Comments
Post a Comment