java - How to handle multi-lines logs with spark streaming? -

March 15, 2012

i'm struggling find way parse multi-lines logs using spark streaming. created parser take array of string input parameter. when multi-line stacktrace found, loops on each line until reaches 'normal' line before processing it.

the logs injected via flume on kafka topic , received via kafkautils.createdirectstream.

when comes spark streaming, stacktraces might cut in middle on 2 (or more) distributed rdds. i'd lucky if not happen ...

my question is: reconstruct stacktraces cut before processing them?

should pre-process rdds , create new ones containing i'm waiting for? should reconstruct stacktraces via global buffer? should somehow play offsets? how exactly?

any ideas welcome.

thanx,

--mike

Search This Blog

Color

java - How to handle multi-lines logs with spark streaming? -

Comments

Post a Comment

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -