pandas - Aggregate/Remove duplicate rows in DataFrame based on swapped index levels -

March 15, 2013

sample input

import pandas pd df = pd.dataframe([         ['a', 'b', 1, 5],         ['b', 'c', 2, 2],         ['b', 'a', 1, 1],         ['c', 'b', 1, 3]],          columns=['from', 'to', 'type', 'value'])  df = df.set_index(['from', 'to', 'type'])

which looks this:

                  value     type         b     1     5 b     c     2     2           1     1 c     b     1     3

goal

i want remove "duplicate" rows in following sense: each row arbitrary index (from, to, type), if there exists row (to, from, type), value of second row should added first row and second row dropped. in example above, row (b, a, 1) value 1 should added first row , dropped, leading following desired result.

sample result

                  value   type     b   1       6 b     c   2       2 c     b   1       3

this best try far. feels unnecessarily verbose , clunky:

# aggregate val of rows (from,to,type) == (to,from,type)  df2 = df.reset_index() df3 = df2.rename(columns={'from':'to', 'to':'from'}) df_both = df.join(df3.set_index(                     ['from', 'to', 'type']),                      rsuffix='_b').sum(axis=1)  # remove second, i.e. (to,from,t) row rows_to_keep = [] rows_to_remove = [] a,b,t in df_both.index:     if (b,a,t) in df_both.index , not (b,a,t) in rows_to_keep:         rows_to_keep.append((a,b,t))         rows_to_remove.append((b,a,t))  df_final = df_both.drop(rows_to_remove) df_final

especially second "de-duplication" step feels unpythonic. (how) can improve these steps?

not sure how better is, it's different

  import pandas pd   collections import counter    df = pd.dataframe([           ['a', 'b', 1, 5],           ['b', 'c', 2, 2],           ['b', 'a', 1, 1],           ['c', 'b', 1, 3]],            columns=['from', 'to', 'type', 'value'])    df = df.set_index(['from', 'to', 'type'])   ls = df.to_records()   ls = list(ls)   ls2=[]   l in ls:       i=0       while <= l[3]:           ls2.append(list(l)[:3])           i+=1   counted = counter(tuple(sorted(entry)) entry in ls2)

Search This Blog

Color