python - How to avoid mutiple columns on Pandas.Merge -
imagine have following dataframes on pandas:
in [7]: a= pd.dataframe([['foo'],['bar'],['quz'],['baz']],columns=['key']) in [8]: a['value'] = 'none' in [9]: out[9]: key value 0 foo none 1 bar none 2 quz none 3 baz none in [10]: b = pd.dataframe([['foo',5],['bar',6],['quz',7]],columns= ['key','value']) in [11]: b out[11]: key value 0 foo 5 1 bar 6 2 quz 7 in [12]: pd.merge(a,b, on='key', how='outer') out[12]: key value_x value_y 0 foo none 5 1 bar none 6 2 quz none 7 3 baz none nan but want (avoiding repeat column basically):
key value 0 foo 5 1 bar 6 2 quz 7 3 baz nan i suppose can take output , drop _x value , rename _y seems overkill. on sql trivial.
edit:
john recomended use:
in [1]: a.set_index('key', inplace=true) a.update(b.set_index('key'), join='left', overwrite=true) a.reset_index(inplace=true) this works , asked for.
in example merging 2 dataframes same column, 1 contains strings ('none') other integers, pandas doesn't know column value want keep , should replaced, creates column both.
you can use update instead
in [10]: a.update(b, join='left', overwrite=true) in [11]: out[11]: key value 0 foo 5 1 bar 6 2 quz 7 3 baz nan another solution state values want given column:
in [15]: a.loc[b.index, 'value'] = b.value in [16]: out[16]: key value 0 foo 5 1 bar 6 2 quz 7 3 baz nan personally prefer second solution because know happening, first closer looking in question.
edit:
if indices don't match, i'm not quite sure how make happen. hence suggest making them match:
in [1]: a.set_index('key', inplace=true) a.update(b.set_index('key'), join='left', overwrite=true) a.reset_index(inplace=true) it may there better way this, don't believe pandas has way perform operation outright.
the second solution can used updated index:
in [24]: a.set_index('key', inplace=true) a.loc[b.key, 'value'] = b.value.tolist()
Comments
Post a Comment