regex - remove partial duplicate lines from text file notepad++ -
i have huge list example below , need remove lines 1,3,6 , 8 because partially duplicated, need maintain longest line.
compaq presario a940es notebook pc compaq presario a940es notebook pc - ku048ear hp pavilion dv7-1210ea notebook pc hp pavilion dv7-1210ea notebook pc - ng385ea#abu hp pavilion dv7-1210ea notebook pc - ng385ear hp pavilion dv7-1210ed notebook pc hp pavilion dv7-1210ed notebook pc - na048ea#abh hp pavilion dv7-1210ed notebook pc - na048ea
the final result need is:
compaq presario a940es notebook pc - ku048ear hp pavilion dv7-1210ea notebook pc - ng385ea#abu hp pavilion dv7-1210ea notebook pc - ng385ear hp pavilion dv7-1210ed notebook pc - na048ea#abh
if dont need keep original sequence of lines, try this:
- sort lines edit -> line operations -> sort lines lexicographically ascending
- be sure last line ends newline
- now find/replace:
- find what:
^(.*)\r\n(\1.*?\r\n) - replace with:
\2 - check in lower left: regular expression , . matches newline
- if lineendings
\n: use\ninstead of 2\r\nin find what. - hit replace or replace all, hit often, long until there nothing left replace, status bar in replace dialog tell that.
- find what:
how works:
- the sorting puts duplicates in sequence , longest "duplicate" last!
- the find/replace considers 2 lines, first line part of second line , replaces both lines second line. (that means, if have 3 duplicates: first replace leave second , third line standing , need replace all.)
Comments
Post a Comment