java - How to handle multi-lines logs with spark streaming? -
i'm struggling find way parse multi-lines logs using spark streaming. created parser take array of string input parameter. when multi-line stacktrace found, loops on each line until reaches 'normal' line before processing it.
the logs injected via flume on kafka topic , received via kafkautils.createdirectstream.
when comes spark streaming, stacktraces might cut in middle on 2 (or more) distributed rdds. i'd lucky if not happen ...
my question is: reconstruct stacktraces cut before processing them?
should pre-process rdds , create new ones containing i'm waiting for? should reconstruct stacktraces via global buffer? should somehow play offsets? how exactly?
any ideas welcome.
thanx,
--mike
Comments
Post a Comment