r - Combine two dataframes that have common rows and columns (fill in) -
i have big dataframe want fill in results sql queries many databases, kind of "filling in data cubbyholes" speak. wrinkle: don't know how many cubbyholes filled in (there's group_by year, might dataframe 1 year, or many).
i'm having difficulty figuring out how accomplish this. i'm trying use dplyr package..
left_join either adds same row twice (if specify
by=
), or drops new column (if don't specifyby=
, joins on both similar columns)bind_cols doesn't work
bind_rows adds duplicate row.
how can new data fill in cubbyhole itself? (btw, i'm not married dplyr... don't want iterate through every element of new dataframe)
code follows:
library(dplyr) targetdf <- structure(list(ind = c(5, 6, 7), `2015 act` = c(7870l, na, na )), .names = c("ind", "2015 act"), class = c("tbl_df", "data.frame" ), row.names = c(na, -3l)) tempdf <- structure(list(ind = 6, `2015 act` = 49782l, `2016 act` = 323l), .names = c("ind", "2015 act", "2016 act"), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(na, -1l)) left_join(targetdf,tempdf, by= "ind") ## gives duplicate columns left_join(targetdf,tempdf) ## loses new "2015 act" data ind 6 bind_cols(targetdf,tempdf) ## don't work bind_rows(targetdf,tempdf) ## double ind 6 (there other columns nor included here, why can't !is.na() eliminate duplicate ind 6)
one possible way non-na
values out of each column grouped ind
, , otherwise, leave (generate) na
full_join(targetdf, tempdf) %>% group_by(ind) %>% summarise_each(funs(.[!is.na(.)][1l])) # source: local data frame [3 x 3] # # ind 2015 act 2016 act # (dbl) (int) (int) # 1 5 7870 na # 2 6 49782 323 # 3 7 na na
Comments
Post a Comment