Java 8 Streams: Read file word by word -

March 15, 2011

i use java 8 streams lot process files far line-by-line.

what want function, gets bufferedreader br , should read specific number of words (seperated "\\s+") , should leave bufferedreader @ exact position, number of words reached.

right have version, reads file linewise:

    final int[] wordcount = {20};     br           .lines()           .map(l -> l.split("\\s+"))           .flatmap(arrays::stream)           .filter(s -> {               //process s               if(--wordcount[0] == 0) return true;               return false;           }).findfirst();

this leaves inputstream @ position of next line of 20th word.
there way stream reads less line inputstream?

edit
parsing file first word contains number of following words. read word , accordingly read in specific number of words. file contains multiple such sections, each section parsed in described function.

having read helpful comments, becomes clear me, using scanner right choice problem , java 9 have scanner class provides stream features (scanner.tokens() , scanner.findall()).
using streams way described give me no guarantee, reader @ specific position, after terminal operation of stream (api docs), therefore making streams wrong choice parsing structure, parse section , have keep track of position.

regarding original problem: assume file looks this:

5 section of 5 words 3 three words section 2 short section 7 section contains lot  of words

and want output this:

[a, section, of, five, words] [three, words, section] [short, section] [this, section, contains, a, lot, of, words]

in general stream api badly suitable such problems. writing plain old loop looks better solution here. if still want see stream api based solution, can suggest using streamex library contains headtail() method allowing write custom stream-transformation logic. here's how problem solved using headtail:

/* transform stream of words 2, a, b, 3, c, d, e    stream of lists [a, b], [c, d, e] */ public static streamex<list<string>> records(streamex<string> input) {     return input.headtail((count, tail) ->          makerecord(tail, integer.parseint(count), new arraylist<>())); }  private static streamex<list<string>> makerecord(streamex<string> input, int count,                                                   list<string> buf) {     return input.headtail((head, tail) -> {         buf.add(head);         return buf.size() == count                  ? records(tail).prepend(buf)                 : makerecord(tail, count, buf);     }); }

usage example:

string s = "5 section of 5 words 3 three words\n"         + "section 2 short section 7 section contains lot\n"         + "of words"; reader reader = new stringreader(s); stream<list<string>> stream = records(streamex.oflines(reader)                .flatmap(pattern.compile("\\s+")::splitasstream)); stream.foreach(system.out::println);

the result looks desired output above. replace reader bufferedreader or filereader read input file. stream of records lazy: @ 1 record preserved stream @ time , if short-circuit, rest of input not read (well, of course current file line read end). solution, while looks recursive, not eat stack or heap, works huge files well.

explanation:

the headtail() method takes two-argument lambda executed @ once during outer stream terminal operation execution, when stream element requested. lambda receives first stream element (head) , stream contains other original elements (tail). lambda should return new stream used instead of original one. in records have:

return input.headtail((count, tail) ->      makerecord(tail, integer.parseint(count), new arraylist<>()));

first element of input count: convert number, create empty arraylist , call makerecord tail. here's makerecord helper method implementation:

return input.headtail((head, tail) -> {

first stream element head, add current buffer:

    buf.add(head);

target buffer size reached?

    return buf.size() == count

if yes, call records tail again (process next record, if any) , prepend resulting stream single element: current buffer.

            ? records(tail).prepend(buf)

otherwise, call myself tail (to add more elements buffer).

            : makerecord(tail, count, buf); });

Search This Blog

Color

Java 8 Streams: Read file word by word -

Comments

Post a Comment

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -