python - unique combinations of values in selected columns in pandas data frame and count -

June 15, 2011

i have data in pandas data frame follows:

df1 = pd.dataframe({'a':['yes','yes','yes','yes','no','no','yes','yes','yes','no'],                    'b':['yes','no','no','no','yes','yes','no','yes','yes','no']})

so, data looks this

---------------------------- index                b 0           yes      yes 1           yes       no 2           yes       no 3           yes       no 4            no      yes 5            no      yes 6           yes       no 7           yes      yes 8           yes      yes 9            no       no -----------------------------

i transform data frame. expected output can shown in following python script:

output = pd.dataframe({'a':['no','no','yes','yes'],'b':['no','yes','no','yes'],'count':[1,2,4,3]})

so, expected output looks this

-------------------------------------------- index            b       count -------------------------------------------- 0         no       no        1 1         no      yes        2 2        yes       no        4 3        yes      yes        3 --------------------------------------------

actually, can achieve find combinations , count them using following command: mytable = df1.groupby(['a','b']).size()

however, turns out such combinations in single column. separate each value in combination different column , add 1 more column result of counting. possible that? may have suggestions? thank in advance.

you can groupby on cols 'a' , 'b' , call size , reset_index , rename generated column:

in [26]:  df1.groupby(['a','b']).size().reset_index().rename(columns={0:'count'}) out[26]:         b  count 0   no   no      1 1   no  yes      2 2  yes   no      4 3  yes  yes      3

update

a little explanation, grouping on 2 columns, groups rows , b values same, call size returns number of unique groups:

in[202]: df1.groupby(['a','b']).size()  out[202]:     b   no   no     1      yes    2 yes  no     4      yes    3 dtype: int64

so restore grouped columns, call reset_index:

in[203]: df1.groupby(['a','b']).size().reset_index()  out[203]:          b  0 0   no   no  1 1   no  yes  2 2  yes   no  4 3  yes  yes  3

this restores indices size aggregation turned generated column 0, have rename this:

in[204]: df1.groupby(['a','b']).size().reset_index().rename(columns={0:'count'})  out[204]:          b  count 0   no   no      1 1   no  yes      2 2  yes   no      4 3  yes  yes      3

groupby accept arg as_index have set false doesn't make grouped columns index, generates series , you'd still have restore indices , on....:

in[205]: df1.groupby(['a','b'], as_index=false).size()  out[205]:     b   no   no     1      yes    2 yes  no     4      yes    3 dtype: int64

Search This Blog

Color

python - unique combinations of values in selected columns in pandas data frame and count -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -