python - Memory usage OneVsRest sklearn -


i'm having trouble memory usage using sklearn's onevsrest class in loop crossvalidation (we cannot use sklearns crossvalidation methods different reason not related question).

the setup classifier this:

clf = make_pipeline(tfidfvectorizer(), onevsrestclassifier(sgdclassifier(loss='log', n_iter=5, penalty=none, alpha=1e-7, average=true)) 

using memory_profiler module, memory usage looks this:

  line #    mem usage    increment   line contents   ================================================     (...)      231   1092.8 mib    911.7 mib       clf.fit(x_train, y_train)     (...)    next loop:      231   2001.8 mib    897.4 mib       clf.fit(x_train, y_train)   next loop:      231   2892.5 mib    890.5 mib       clf.fit(x_train, y_train)   next loop:      231   2928.1 mib     35.6 mib       clf.fit(x_train, y_train)   next loop:      231   2977.6 mib     49.5 mib       clf.fit(x_train, y_train)   next loop:      231   3009.2 mib     31.6 mib       clf.fit(x_train, y_train)   next loop:      231   3014.6 mib      5.4 mib       clf.fit(x_train, y_train)   next loop:      231   3019.4 mib      4.8 mib       clf.fit(x_train, y_train)   next loop:      231   3031.0 mib     11.6 mib       clf.fit(x_train, y_train)   next loop:      231   3041.4 mib     10.4 mib       clf.fit(x_train, y_train) 

as see, memory usage increases steeply in beginning , more slowly.

using classifiers not using onevsrest, memory usage stays more or less constant, since classifier re-fit new training data each loop. behavior expect.

why happening? how can prevent it? sklearn bug or more low-level? note re-initializing classifier bad our current code.

update:

i changed code reinitialize classifier, , deleted using clf key after getting prediction , problem remains. furthermore, allocated memory not turn using pympler.summary seems not python object numpy array (as expected). using .sparsify() not change anything.

when running memory limit, swap being filled, os not seem able reuse memory.

note above example on small part of dataset.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -