r - Weighting class in machine learning task -


i'm trying out machine learning task (binary classification) using caret , wondering if there way incorporate information "uncertain" class, or weight classes differently.

as illustration, i've cut , paste of code caret homepage working sonar dataset (placeholder code - anything):

library(mlbench) testdat <- get(data(sonar)) set.seed(946) testdat$source<-as.factor(sample(c(letters[1:6],letters[1:3]),nrow(testdat),replace = t)) 

yielding:

summary(testdat$source)     b  c  d  e  f    49 51 44 17 28 19    

after continue typical train,tune, , test routine once decide on model.

what i've added here factor column of source, or corresponding "class" came from. arbitrary example, these 6 different people made designation of "class" using different methods , want put greater importance on a's classification method b's less c's , forth.

the actual data this, there class imbalances, both among true/false, m/r, or whatever class, , among these sources. vignettes , examples have found, @ least former address using metric roc during tuning, how incorporate latter, i'm not sure.

  • separating original data source , cycling through factor levels 1 @ time, using current level build model , rest of data test it

  • instead of classification, turn hybrid classification/regression problem, use ranks of sources want model. if considered best, "a positive" score of +6, "a negative", score of -6 , on. perform regression fit on these values, ignoring class column.

any thoughts? every search conduct on classes , weights seems reference class imbalance issue, assumes classification perfect (or standard on model). inappropriate try incorporate information , should include , ignore source? potential issue first plan smaller sources account around few hundred instances, versus on 10,000 larger sources, might concerned model built on smaller set wouldn't generalize 1 based on more data. thoughts appreciated.

there no difference between weighting "because of importance" , weighting "because imbalance". these same settings, both refer "how should penalize model missclassifing sample particular class". not need regression (and should not so! stated classification problem, , overthinking it) providing samples weights, thats all. there many models in caret accepting kind of setting, including glmnet, glm, cforest etc. if want use svm should change package (as ksvm not support such things) example https://cran.r-project.org/web/packages/gmum.r/gmum.r.pdf (for sample or class weighting) or https://cran.r-project.org/web/packages/e1071/e1071.pdf (if class weighting)


Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -