r - How to reshape a dataframe with "reoccurring" columns? -
i new data analysis r. got pre-formatted environmental observation-model dataset, example subset of shown below:
date site obs mod site obs mod 2000-09-01 00:00:00 campus na 61.63 city centre 66 56.69 2000-09-01 01:00:00 campus 52 62.55 city centre na 54.75 2000-09-01 02:00:00 campus 52 63.52 city centre 56 54.65 basically, data include time series of hourly observed , modelled concentrations of pollutant @ various sites in "reoccurring columns", i.e., site - obs - mod (in example showed 2 out of total 75 sites). read "wide" dataset in data frame, , wanted reshape "narrower" format as:
date site obs mod 2000-09-01 00:00:00 campus na 61.63 2000-09-01 01:00:00 campus 52 62.55 2000-09-01 02:00:00 campus 52 63.52 2000-09-01 00:00:00 city centre 66 56.69 2000-09-01 01:00:00 city centre na 54.75 2000-09-01 02:00:00 city centre 56 54.65 i believed should use package "reshape2" this. firstly tried melt , dcast dataset:
test.melt <- melt(test.data, id.vars = "date", measure.vars = c("site", "obs", "mod")) however, returned half of data, i.e., records of site(s) ("city centre") following first 1 ("campus") cut off:
date variable value 2001-01-01 00:00:00 site campus 2001-01-01 01:00:00 site campus 2001-01-01 02:00:00 site campus 2001-01-01 00:00:00 obs na 2001-01-01 01:00:00 obs 52 2001-01-01 02:00:00 obs 52 2001-01-01 00:00:00 mod 61.63 2001-01-01 01:00:00 mod 62.55 2001-01-01 02:00:00 mod 63.52 i tried recast:
test.recast <- recast(test.data, date ~ site + obs + mod) however, returned error message:
error in eval(expr, envir, enclos) : object 'site' not found i have tried search previous questions have not found similar scenarios (correct me if wrong). please me this?
many in advance!
you might better off using base r reshape after doing variable name cleanup.
here's data.
test <- read.table(header = true, stringsasfactors=false, text = "date site obs mod site obs mod '2000-09-01 00:00:00' campus na 61.63 'city centre' 66 56.69 '2000-09-01 01:00:00' campus 52 62.55 'city centre' na 54.75 '2000-09-01 02:00:00' campus 52 63.52 'city centre' 56 54.65") test # date site obs mod site.1 obs.1 mod.1 # 1 2000-09-01 00:00:00 campus na 61.63 city centre 66 56.69 # 2 2000-09-01 01:00:00 campus 52 62.55 city centre na 54.75 # 3 2000-09-01 02:00:00 campus 52 63.52 city centre 56 54.65 if did correctly, should names got: @chase mentions in answer, "recurring column names bit of oddity , not normal r behaviour"--so we've got fix that.
note: both of these options generate "time" variable can go ahead , drop. might want keep in case wanted reshape wide format.
option 1: if got names did (which should have), solution simple. first site, append "0" site name , use base r reshape:
names(test)[2:4] <- paste(names(test)[2:4], "0", sep=".") test <- reshape(test, direction = "long", idvar = "date", varying = 2:ncol(test)) rownames(test) <- null # reshape makes ugly rownames test # date time site obs mod # 1 2000-09-01 00:00:00 0 campus na 61.63 # 2 2000-09-01 01:00:00 0 campus 52 62.55 # 3 2000-09-01 02:00:00 0 campus 52 63.52 # 4 2000-09-01 00:00:00 1 city centre 66 56.69 # 5 2000-09-01 01:00:00 1 city centre na 54.75 # 6 2000-09-01 02:00:00 1 city centre 56 54.65option 2: if have duplicated column names, fix still easy, , follows same logic. first, create nicer column names (easy using
rep()), , usereshape()described above.names(test)[-1] <- paste(names(test)[-1], rep(1:((ncol(test)-1)/3), each = 3), sep = ".") test <- reshape(test, direction = "long", idvar = "date", varying = 2:ncol(test)) rownames(test) <- null ### or, more convenient: # names(test) <- make.unique(names(test)) # names(test)[2:4] <- paste(names(test)[2:4], "0", sep=".") # test <- reshape(test, direction = "long", # idvar = "date", varying = 2:ncol(test)) # rownames(test) <- nulloptional step: data in form still not totally "long". if required, required 1 more step:
require(reshape2) melt(test, id.vars = c("date", "site", "time")) # date site time variable value # 1 2000-09-01 00:00:00 campus 0 obs na # 2 2000-09-01 01:00:00 campus 0 obs 52.00 # 3 2000-09-01 02:00:00 campus 0 obs 52.00 # 4 2000-09-01 00:00:00 city centre 1 obs 66.00 # 5 2000-09-01 01:00:00 city centre 1 obs na # 6 2000-09-01 02:00:00 city centre 1 obs 56.00 # 7 2000-09-01 00:00:00 campus 0 mod 61.63 # 8 2000-09-01 01:00:00 campus 0 mod 62.55 # 9 2000-09-01 02:00:00 campus 0 mod 63.52 # 10 2000-09-01 00:00:00 city centre 1 mod 56.69 # 11 2000-09-01 01:00:00 city centre 1 mod 54.75 # 12 2000-09-01 02:00:00 city centre 1 mod 54.65
update (to try address questions comments)
the
reshape()documentation quite confusing. it's best work through few examples understanding of how works. specifically, "time" not have refer time ("date" in problem), more for, say, panel data, records collected @ different times same id. in case, "id" in original data "date" column. other potential "id" site, not in way data organized.imagine, moment, if data looked this:
test1 <- structure(list(date = structure(1:3, .label = c("2000-09-01 00:00:00", "2000-09-01 01:00:00", "2000-09-01 02:00:00"), class = "factor"), obs.campus = c(na, 52l, 52l), mod.campus = c(61.63, 62.55, 63.52), obs.citycentre = c(66l, na, 56l), mod.citycentre = c(56.69, 54.75, 54.65)), .names = c("date", "obs.campus", "mod.campus", "obs.citycentre", "mod.citycentre"), class = "data.frame", row.names = c(na, -3l)) test1 # date obs.campus mod.campus obs.citycentre mod.citycentre # 1 2000-09-01 00:00:00 na 61.63 66 56.69 # 2 2000-09-01 01:00:00 52 62.55 na 54.75 # 3 2000-09-01 02:00:00 52 63.52 56 54.65now try
reshape(test1, direction = "long", idvar = "date", varying = 2:ncol(test1)). you'll seereshape()sees site names "time" (that can overridden adding "timevar = "site""reshapecommand).when
direction = "long", must specify columns vary "time". in case, columns except first, hence use of2:ncol(test)"varying".test2? that?question under @chase's answer: think misunderstand how
melt()supposed work. basically, tries "skinniest" form of data. in case, skinniest form "optional step" described above sincedate+siteminimum required comprise unique id variable. (i "time" can safely dropped.)once data in format described in "optional step" (we'll assume output has been stored "
test.melt", can pivot table around in different ways. demonstration of mean that, try following , see do.dcast(test.melt, date + site ~ variable) dcast(test.melt, date ~ variable + site) dcast(test.melt, variable + site ~ date) dcast(test.melt, variable + date ~ site)it not easy have flexibility if stop @ "option 1" or "option 2".
update (a few years later)
melt "data.table" can "melt" multiple columns in similar way reshape does. should work whether or not column names duplicated.
you can try following:
measure <- c("site", "obs", "mod") melt(as.data.table(test), measure.vars = patterns(measure), value.name = measure) # date variable site obs mod # 1: 2000-09-01 00:00:00 1 campus na 61.63 # 2: 2000-09-01 01:00:00 1 campus 52 62.55 # 3: 2000-09-01 02:00:00 1 campus 52 63.52 # 4: 2000-09-01 00:00:00 2 city centre 66 56.69 # 5: 2000-09-01 01:00:00 2 city centre na 54.75 # 6: 2000-09-01 02:00:00 2 city centre 56 54.65
Comments
Post a Comment