regex - Sliding window pattern match in perl or matlab regular expressions -
i trying use either perl or matlab parse few numbers out of single line of text. text line is:
t10_t20_t30_t40_
now in matlab, used following script
str = 't10_t20_t30_t40_'; = regexp(str,'t(\d+)_t(\d+)','match')
and returns
a = 't10_t20' 't30_t40'
what want return 't20_t30', since match. why doesn't regexp scan it?
i turned perl, , wrote following in perl:
#!/usr/bin/perl -w $str = "t10_t20_t30_t40_"; while($str =~ /(t\d+_t\d+)/g) { print "$1\n"; }
and result same matlab
t10_t20 t30_t40
but wanted "t20_t30" in results.
can tell me how accomplish that? thanks!
[update solution]: colleagues, identified solution using so-called "look-around assertion" afforded perl.
#!/usr/bin/perl -w $str = "t10_t20_t30_t40_"; while($str =~ m/(?=(t\d+_t\d+))/g) {print "$1\n";}
the key use "zero width look-ahead assertion" in perl. when perl (and other similar packages) uses regexp scan string, not re-scan scanned in last match. in above example, t20_t30 never show in results. capture that, need use zero-width lookahead search scan string, producing matches not exclude substrings subsequent searches (see working code above). search start zero-th position , increment 1 many times possible if "global" modifier appended search (i.e. m//g), making "greedy" search.
this explained in more detail in this blog post.
the expression (?=t\d+_t\d+) matches 0-width string followed t\d+_t\d+, , creates actual "sliding window". returns t\d+_t\d+ patterns in $str without exclusion since every position in $str 0-width string. additional parenthesis captures pattern while doing sliding matching (?=(t\d+_t\d+)) , returns desired sliding window outcome.
using perl:
#!/usr/bin/perl use data::dumper; use modern::perl; $re = qr/(?=(t\d+_t\d+))/; @l = 't10_t20_t30_t40' =~ /$re/g; dumper(\@l);
output:
$var1 = [ 't10_t20', 't20_t30', 't30_t40' ];
Comments
Post a Comment