pandas - Aggregate/Remove duplicate rows in DataFrame based on swapped index levels -
sample input
import pandas pd df = pd.dataframe([ ['a', 'b', 1, 5], ['b', 'c', 2, 2], ['b', 'a', 1, 1], ['c', 'b', 1, 3]], columns=['from', 'to', 'type', 'value']) df = df.set_index(['from', 'to', 'type']) which looks this:
value type b 1 5 b c 2 2 1 1 c b 1 3 goal
i want remove "duplicate" rows in following sense: each row arbitrary index (from, to, type), if there exists row (to, from, type), value of second row should added first row and second row dropped. in example above, row (b, a, 1) value 1 should added first row , dropped, leading following desired result.
sample result
value type b 1 6 b c 2 2 c b 1 3 this best try far. feels unnecessarily verbose , clunky:
# aggregate val of rows (from,to,type) == (to,from,type) df2 = df.reset_index() df3 = df2.rename(columns={'from':'to', 'to':'from'}) df_both = df.join(df3.set_index( ['from', 'to', 'type']), rsuffix='_b').sum(axis=1) # remove second, i.e. (to,from,t) row rows_to_keep = [] rows_to_remove = [] a,b,t in df_both.index: if (b,a,t) in df_both.index , not (b,a,t) in rows_to_keep: rows_to_keep.append((a,b,t)) rows_to_remove.append((b,a,t)) df_final = df_both.drop(rows_to_remove) df_final especially second "de-duplication" step feels unpythonic. (how) can improve these steps?
not sure how better is, it's different
import pandas pd collections import counter df = pd.dataframe([ ['a', 'b', 1, 5], ['b', 'c', 2, 2], ['b', 'a', 1, 1], ['c', 'b', 1, 3]], columns=['from', 'to', 'type', 'value']) df = df.set_index(['from', 'to', 'type']) ls = df.to_records() ls = list(ls) ls2=[] l in ls: i=0 while <= l[3]: ls2.append(list(l)[:3]) i+=1 counted = counter(tuple(sorted(entry)) entry in ls2)
Comments
Post a Comment