algorithm - How to optimise a code to eliminate stopwords from a text Perl -


i have code read text file , file contain stopword list, code take great time in execution, how optimise code?

#!/usr/bin/perl  use strict; use warnings; print "choose name of result file\n"; $fic = <stdin>;  open( fic1, ">$fic" );  @stops; @file;  use file::copy;  open( stopword, "c:\\ats\\stop-ats" ) or die "can't open: $!\n";  @stops = <stopword>; while (<stopword>)    #read each line $_ {     chomp @stops;     # remove newline $_     push @stops, $_;  # add line @triggers }    close stopword;  open( file, "c:\\ats\\ats" ) or die "cannot open file";  while (<file>) {     $line = $_;      #print  $line;     @words = split( /\s/, $line );     foreach $word (@words) {         chomp($word);         foreach $wor (@stops) {             chomp($wor);             if ( $word eq $wor ) {                  #print   "$wor\n";                 $word = '';              }         }          print fic1 $word;         print fic1 " ";      }     print fic1 "\n"; } exit 0; 

the code take long time process text file , how optimise code

the main reason why code slow because loops on array of stopwords each word in input. standard approach here use hash of stopwords rather array.

also, it's clearer chomp whole array once you're sure no new elements coming it, rather chomping elements again , again.

as noted in comments, whe while (<stopwords>) loop doesn't execute, exhaust filehandle reading in list context on previous line.

you haven't provided example input. if want exclude stopwords file of words, it's ok, if want process real text, you'll have more work find occurrences of stopwords: can have different case, , aren't separated whitespace only, there's punctuation, too.

you can start here:

#!/usr/bin/perl use warnings; use strict;  open $stop, 'stop-ats' or die "can't open: $!\n"; %stops; while (<$stop>) {     chomp;     $stops{$_} = 1; }  open $text, '<', 'ats' or die "cannot open file: $!"; while (<$text>) {     @words = split /([[:alpha:]]+)/;     $word (@words) {         print $word unless $stops{ lc $word };     } } 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -