python 3.x - Pandas: Calculate Median of Group over Columns -


given following data frame:

import pandas pd  df = pd.dataframe({'col1': ['a', 'a','a','a','b','b'],                     'col2' : ['aa','aa','bb','bb','bb','bb'],                    'col3' : [2,3,4,5,4,2],                    'col4' : [0,1,2,3,4,2]}) df     col1    col2    col3    col4 0          aa      2       0 1          aa      3       1 2          bb      4       2 3          bb      5       3 4    b       bb      4       4 5    b       bb      2       2 

i like, efficiently possible (i.e. via groupby , lambda x or better), find median of columns 3 , 4 each distinct group of columns 1 , 2.

the desired result follows:

    col1    col2    col3    col4  median 0          aa      2       0    1.5 1          aa      3       1    1.5 2          bb      4       2    3.5 3          bb      5       3    3.5 4    b       bb      4       4    3 5    b       bb      2       2    3 

thanks in advance!

you had idea -- groupby col1 , col2 , calculate median.

m = df.groupby(['col1', 'col2'])[['col3','col4']].apply(np.median) m.name = 'median'  print df.join(m, on=['col1', 'col2'])    col1 col2  col3  col4  median 0      aa     2     0     1.5 1      aa     3     1     1.5 2      bb     4     2     3.5 3      bb     5     3     3.5 4    b   bb     4     4     3.0 5    b   bb     2     2     3.0 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -