python - how to sum across many columns with pandas groupby? -
i have dataframe looks like
day type col d_1 d_2 d_3 d_4 d_5... 1 1 1 0 1 0 1 2 1 0 1 0 2 b 1 1 1 0 0
that is, have 1 normal column (col) , many columns prefixed d_
i need perform groupby day , type , want compute sum of values in every d_ column every day-type combination. need perform other aggregation functions on other columns in data (such col
in example)
i can use:
agg_df=df.groupby(['day','type']).agg({'d_1': 'sum', 'col': 'mean'})
but computes sum 1 d_ column. how can specify possible d_ columns in data?
in other words, write like
agg_df=df.groupby(['day','type']).agg({'d_*': 'sum', 'col': 'mean'})
so expected output is:
day type col d_1 d_2 d_3 d_4 d_5... 1 1.5 2 0 2 0 ... 2 b 1 1 1 0 0
as can see, col aggregated mean, while d_ columns summed.
thanks help!
you can use filter
:
in [23]: df.groupby(['day','type'], as_index=false)[df.filter(regex='d_.*').columns].sum() out[23]: day type d_1 d_2 d_3 d_4 0 1 2 0 2 0 1 2 b 1 1 0 0
if wanna apply functions in 1 shot:
dic = {} dic.update({i:np.sum in df.filter(regex='d_.*').columns}) dic.update({'col':np.mean}) in [48]: df.groupby(['day','type'], as_index=false).agg(dic) #out[48]: # day type d_2 d_3 d_1 col d_4 #0 1 0 2 2 1.5 0 #1 2 b 1 0 1 1.0 0
Comments
Post a Comment