python 3.x - Pandas: Calculate Median of Group over Columns -
given following data frame:
import pandas pd df = pd.dataframe({'col1': ['a', 'a','a','a','b','b'], 'col2' : ['aa','aa','bb','bb','bb','bb'], 'col3' : [2,3,4,5,4,2], 'col4' : [0,1,2,3,4,2]}) df col1 col2 col3 col4 0 aa 2 0 1 aa 3 1 2 bb 4 2 3 bb 5 3 4 b bb 4 4 5 b bb 2 2
i like, efficiently possible (i.e. via groupby , lambda x or better), find median of columns 3 , 4 each distinct group of columns 1 , 2.
the desired result follows:
col1 col2 col3 col4 median 0 aa 2 0 1.5 1 aa 3 1 1.5 2 bb 4 2 3.5 3 bb 5 3 3.5 4 b bb 4 4 3 5 b bb 2 2 3
thanks in advance!
you had idea -- groupby col1 , col2 , calculate median.
m = df.groupby(['col1', 'col2'])[['col3','col4']].apply(np.median) m.name = 'median' print df.join(m, on=['col1', 'col2']) col1 col2 col3 col4 median 0 aa 2 0 1.5 1 aa 3 1 1.5 2 bb 4 2 3.5 3 bb 5 3 3.5 4 b bb 4 4 3.0 5 b bb 2 2 3.0
Comments
Post a Comment