python - unique combinations of values in selected columns in pandas data frame and count -
i have data in pandas data frame follows:
df1 = pd.dataframe({'a':['yes','yes','yes','yes','no','no','yes','yes','yes','no'], 'b':['yes','no','no','no','yes','yes','no','yes','yes','no']})
so, data looks this
---------------------------- index b 0 yes yes 1 yes no 2 yes no 3 yes no 4 no yes 5 no yes 6 yes no 7 yes yes 8 yes yes 9 no no -----------------------------
i transform data frame. expected output can shown in following python script:
output = pd.dataframe({'a':['no','no','yes','yes'],'b':['no','yes','no','yes'],'count':[1,2,4,3]})
so, expected output looks this
-------------------------------------------- index b count -------------------------------------------- 0 no no 1 1 no yes 2 2 yes no 4 3 yes yes 3 --------------------------------------------
actually, can achieve find combinations , count them using following command: mytable = df1.groupby(['a','b']).size()
however, turns out such combinations in single column. separate each value in combination different column , add 1 more column result of counting. possible that? may have suggestions? thank in advance.
you can groupby
on cols 'a' , 'b' , call size
, reset_index
, rename
generated column:
in [26]: df1.groupby(['a','b']).size().reset_index().rename(columns={0:'count'}) out[26]: b count 0 no no 1 1 no yes 2 2 yes no 4 3 yes yes 3
update
a little explanation, grouping on 2 columns, groups rows , b values same, call size
returns number of unique groups:
in[202]: df1.groupby(['a','b']).size() out[202]: b no no 1 yes 2 yes no 4 yes 3 dtype: int64
so restore grouped columns, call reset_index
:
in[203]: df1.groupby(['a','b']).size().reset_index() out[203]: b 0 0 no no 1 1 no yes 2 2 yes no 4 3 yes yes 3
this restores indices size aggregation turned generated column 0
, have rename this:
in[204]: df1.groupby(['a','b']).size().reset_index().rename(columns={0:'count'}) out[204]: b count 0 no no 1 1 no yes 2 2 yes no 4 3 yes yes 3
groupby
accept arg as_index
have set false
doesn't make grouped columns index, generates series
, you'd still have restore indices , on....:
in[205]: df1.groupby(['a','b'], as_index=false).size() out[205]: b no no 1 yes 2 yes no 4 yes 3 dtype: int64
Comments
Post a Comment