pandas - Aggregate/Remove duplicate rows in DataFrame based on swapped index levels -
sample input
import pandas pd df = pd.dataframe([ ['a', 'b', 1, 5], ['b', 'c', 2, 2], ['b', 'a', 1, 1], ['c', 'b', 1, 3]], columns=['from', 'to', 'type', 'value']) df = df.set_index(['from', 'to', 'type'])
which looks this:
value type b 1 5 b c 2 2 1 1 c b 1 3
goal
i want remove "duplicate" rows in following sense: each row arbitrary index (from, to, type)
, if there exists row (to, from, type)
, value of second row should added first row and second row dropped. in example above, row (b, a, 1)
value 1
should added first row , dropped, leading following desired result.
sample result
value type b 1 6 b c 2 2 c b 1 3
this best try far. feels unnecessarily verbose , clunky:
# aggregate val of rows (from,to,type) == (to,from,type) df2 = df.reset_index() df3 = df2.rename(columns={'from':'to', 'to':'from'}) df_both = df.join(df3.set_index( ['from', 'to', 'type']), rsuffix='_b').sum(axis=1) # remove second, i.e. (to,from,t) row rows_to_keep = [] rows_to_remove = [] a,b,t in df_both.index: if (b,a,t) in df_both.index , not (b,a,t) in rows_to_keep: rows_to_keep.append((a,b,t)) rows_to_remove.append((b,a,t)) df_final = df_both.drop(rows_to_remove) df_final
especially second "de-duplication" step feels unpythonic. (how) can improve these steps?
not sure how better is, it's different
import pandas pd collections import counter df = pd.dataframe([ ['a', 'b', 1, 5], ['b', 'c', 2, 2], ['b', 'a', 1, 1], ['c', 'b', 1, 3]], columns=['from', 'to', 'type', 'value']) df = df.set_index(['from', 'to', 'type']) ls = df.to_records() ls = list(ls) ls2=[] l in ls: i=0 while <= l[3]: ls2.append(list(l)[:3]) i+=1 counted = counter(tuple(sorted(entry)) entry in ls2)
Comments
Post a Comment