python - ValueError: Unknown label type: array([0.11],...) when making extra trees model -
i trying use trees classifier on dataset, , reason @
model.fit(trainx,trainy)
part, throws me a
valueerror: unknown label type: array([[ 0.11], [ 0.12], [ 0.64], [ 0.83], [ 0.33], [ 0.72], [ 0.49],
error. array([0.11] trainy data. i've searched stack overflow , apparently due sklearn not recognizing data type, ive tried from
trainy = np.asarray(trainy,dtype=float) trainy=trainy.astype(float)
and doesnt work, though type(trainy) shows numpy.ndarray. can point me in right direction here?
here's code:
import pandas pd import numpy np sklearn.preprocessing import labelencoder sklearn import metrics sklearn.ensemble import extratreesclassifier sklearn import cross_validation def preprocess(): df= pd.read_csv('c:/users/x/desktop/managerial_and_decision_economics_2013_video_games_dataset.csv',encoding ='iso-8859-1') #drop non ea df = df[df['ea'] ==1] #change categorical variables le = labelencoder() nonnumeric_columns=['console','title','publisher','genre'] feature in nonnumeric_columns: df[feature] = le.fit_transform(df[feature]) #set dataset , target variables dataset =df.ix[:, df.columns != 'us sales (millions)'] target = df['us sales (millions)'] trainx, testx, trainy, testy = cross_validation.train_test_split( dataset, target, test_size=0.3, random_state=0) #attempt fix error? trainx=np.array(trainx) trainy = np.asarray(trainy, dtype="float") return trainx,testx,trainy,testy def classifier(): model = extratreesclassifier(n_estimators=250, random_state=0) model.fit(trainx,trainy) return model.score(testx,testy) trainx,testx,trainy,testy=preprocess()
i'm using scikit-learn 0.17 on python 3.5
your labels [[0.11], [ 0.12],....
. should use extratreesregressor
instead of extratreesclassifier
from source code of forestclassifier
:
y : array-like, shape = [n_samples] or [n_samples, n_outputs] target values (class labels in classification, real numbers in regression).
Comments
Post a Comment