r - Fill elements of column based on multiple criteria -
i have data frame want remove week contains outlier. happy if can indicate entire week outlier, understand how subset there. have not been able come appropriate solution. keep thinking going need loop through subsets of weeks achieve desired goal, or create separate function handle individual outlier week , use sapply. have yet make either of these solutions viable.
date <- seq(as.date("2015-01-01"), length=365, by="1 day") dow <- as.factor(weekdays(as.date(date)) df <- data.frame(cbind(date, dow)) df$date <- as.date(df$date,format="%m/%d/%y",origin="01/01/1970") df$dow <- as.factor(weekdays(as.date(df$date))) set.seed(1115) df$var1 <- rnorm(365, 1912, 40795) stdev <- sd(df$var1, na.rm=true) avg <- mean(df$var1, na.rm=true) df$lb <- avg-(2.75*stdev) df$ub <- avg+(2.75*stdev) df$outlier <- ifelse(df$var1<df$lb | df$var1>df$ub, 1,0) df$weeknum <- as.numeric(format(df$date, "%u")) head(df, 17) > head(df, 17) date dow var1 lb ub outlier weeknum 1 2015-01-01 thursday -7828.412 -114675.6 120479.8 0 0 2 2015-01-02 friday 25674.456 -114675.6 120479.8 0 0 3 2015-01-03 saturday -33588.871 -114675.6 120479.8 0 0 4 2015-01-04 sunday -54418.175 -114675.6 120479.8 0 1 5 2015-01-05 monday -10002.002 -114675.6 120479.8 0 1 6 2015-01-06 tuesday 34050.390 -114675.6 120479.8 0 1 7 2015-01-07 wednesday -37584.648 -114675.6 120479.8 0 1 8 2015-01-08 thursday 84048.878 -114675.6 120479.8 0 1 9 2015-01-09 friday -24801.346 -114675.6 120479.8 0 1 10 2015-01-10 saturday 33974.637 -114675.6 120479.8 0 1 11 2015-01-11 sunday 77432.088 -114675.6 120479.8 0 2 12 2015-01-12 monday 128196.236 -114675.6 120479.8 1 2 13 2015-01-13 tuesday 9740.418 -114675.6 120479.8 0 2 14 2015-01-14 wednesday 26539.887 -114675.6 120479.8 0 2 15 2015-01-15 thursday 12172.834 -114675.6 120479.8 0 2 16 2015-01-16 friday 1032.544 -114675.6 120479.8 0 2 17 2015-01-17 saturday 76870.095 -114675.6 120479.8 0 2
in above example, desired output 1 outlier column in each row corresponds weeknum = 2.
you "the desired output 1 outlier column in each row corresponds weeknum = 2." need outlier column, then? seems can subset data.frame
based on values of weeknum column, follows:
df <- df[!(df$weeknum==2),]
Comments
Post a Comment