python - Memory usage OneVsRest sklearn -
i'm having trouble memory usage using sklearn's onevsrest
class in loop crossvalidation (we cannot use sklearns crossvalidation methods different reason not related question).
the setup classifier this:
clf = make_pipeline(tfidfvectorizer(), onevsrestclassifier(sgdclassifier(loss='log', n_iter=5, penalty=none, alpha=1e-7, average=true))
using memory_profiler module, memory usage looks this:
line # mem usage increment line contents ================================================ (...) 231 1092.8 mib 911.7 mib clf.fit(x_train, y_train) (...) next loop: 231 2001.8 mib 897.4 mib clf.fit(x_train, y_train) next loop: 231 2892.5 mib 890.5 mib clf.fit(x_train, y_train) next loop: 231 2928.1 mib 35.6 mib clf.fit(x_train, y_train) next loop: 231 2977.6 mib 49.5 mib clf.fit(x_train, y_train) next loop: 231 3009.2 mib 31.6 mib clf.fit(x_train, y_train) next loop: 231 3014.6 mib 5.4 mib clf.fit(x_train, y_train) next loop: 231 3019.4 mib 4.8 mib clf.fit(x_train, y_train) next loop: 231 3031.0 mib 11.6 mib clf.fit(x_train, y_train) next loop: 231 3041.4 mib 10.4 mib clf.fit(x_train, y_train)
as see, memory usage increases steeply in beginning , more slowly.
using classifiers not using onevsrest
, memory usage stays more or less constant, since classifier re-fit new training data each loop. behavior expect.
why happening? how can prevent it? sklearn bug or more low-level? note re-initializing classifier bad our current code.
update:
i changed code reinitialize classifier, , deleted using clf key after getting prediction , problem remains. furthermore, allocated memory not turn using pympler.summary
seems not python object numpy array (as expected). using .sparsify()
not change anything.
when running memory limit, swap being filled, os not seem able reuse memory.
note above example on small part of dataset.
Comments
Post a Comment