Scala concurrency performace issues -
i have data mining app.
there 1 mining actor receives , processes json containing 1000 objects. put list , foreach, log data sending 1 logger actor logs data many files.
processing list sequentially, app uses 700mb , takes ~15 seconds of 20% cpu power process (4 core cpu). when parallelize list, app uses 2gb , ~ same amount of time , cpu process.
my questions are:
since parallelized list , computation, shouldn't compute-time decrease? think having 1 logger actor bottleneck in case. computation may faster bottleneck hides speed increase. if add more loggers pool, app time should decrease?
why memory usage jump 2gb? jvm have store entire collection in memory parallelize it? , after computation done, jvm garbage collector should deal it?
without more details, answer guess. however, guess might point right direction.
- parallelized execution should decrease running time problem might lie elsewhere. reason, cpu idling lot in single-threaded mode. not specify whether read input disk or network or write output to. explicitly write logs lot of files. disk , network reading/writing might in case take longer data processing. process idle due i/o waiting. should not expect speedups parallelizing job spends 80% of time waiting on i/o. therefore suspect loggers not bottleneck here.
- the memory usage might jump if threads allocate lot of memory each. in case, more threads have more memory required. don't know kind of collection parallelizing on, stored in memory, completely. yes, garbage collector free resources not require explicitly free them, such files.
Comments
Post a Comment