pandas - Aggregate/Remove duplicate rows in DataFrame based on swapped index levels -


sample input

import pandas pd df = pd.dataframe([         ['a', 'b', 1, 5],         ['b', 'c', 2, 2],         ['b', 'a', 1, 1],         ['c', 'b', 1, 3]],          columns=['from', 'to', 'type', 'value'])  df = df.set_index(['from', 'to', 'type']) 

which looks this:

                  value     type         b     1     5 b     c     2     2           1     1 c     b     1     3 

goal

i want remove "duplicate" rows in following sense: each row arbitrary index (from, to, type), if there exists row (to, from, type), value of second row should added first row and second row dropped. in example above, row (b, a, 1) value 1 should added first row , dropped, leading following desired result.

sample result

                  value   type     b   1       6 b     c   2       2 c     b   1       3 

this best try far. feels unnecessarily verbose , clunky:

# aggregate val of rows (from,to,type) == (to,from,type)  df2 = df.reset_index() df3 = df2.rename(columns={'from':'to', 'to':'from'}) df_both = df.join(df3.set_index(                     ['from', 'to', 'type']),                      rsuffix='_b').sum(axis=1)  # remove second, i.e. (to,from,t) row rows_to_keep = [] rows_to_remove = [] a,b,t in df_both.index:     if (b,a,t) in df_both.index , not (b,a,t) in rows_to_keep:         rows_to_keep.append((a,b,t))         rows_to_remove.append((b,a,t))  df_final = df_both.drop(rows_to_remove) df_final 

especially second "de-duplication" step feels unpythonic. (how) can improve these steps?

not sure how better is, it's different

  import pandas pd   collections import counter    df = pd.dataframe([           ['a', 'b', 1, 5],           ['b', 'c', 2, 2],           ['b', 'a', 1, 1],           ['c', 'b', 1, 3]],            columns=['from', 'to', 'type', 'value'])    df = df.set_index(['from', 'to', 'type'])   ls = df.to_records()   ls = list(ls)   ls2=[]   l in ls:       i=0       while <= l[3]:           ls2.append(list(l)[:3])           i+=1   counted = counter(tuple(sorted(entry)) entry in ls2) 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -