regex - remove partial duplicate lines from text file notepad++ -


i have huge list example below , need remove lines 1,3,6 , 8 because partially duplicated, need maintain longest line.

 compaq presario a940es notebook pc compaq presario a940es notebook pc - ku048ear hp pavilion dv7-1210ea notebook pc  hp pavilion dv7-1210ea notebook pc - ng385ea#abu hp pavilion dv7-1210ea notebook pc - ng385ear hp pavilion dv7-1210ed notebook pc  hp pavilion dv7-1210ed notebook pc - na048ea#abh hp pavilion dv7-1210ed notebook pc - na048ea

the final result need is:

compaq presario a940es notebook pc - ku048ear hp pavilion dv7-1210ea notebook pc - ng385ea#abu hp pavilion dv7-1210ea notebook pc - ng385ear hp pavilion dv7-1210ed notebook pc - na048ea#abh

if dont need keep original sequence of lines, try this:

  • sort lines edit -> line operations -> sort lines lexicographically ascending
  • be sure last line ends newline
  • now find/replace:
    • find what: ^(.*)\r\n(\1.*?\r\n)
    • replace with: \2
    • check in lower left: regular expression , . matches newline
    • if lineendings \n: use \n instead of 2 \r\n in find what.
    • hit replace or replace all, hit often, long until there nothing left replace, status bar in replace dialog tell that.

how works:

  1. the sorting puts duplicates in sequence , longest "duplicate" last!
  2. the find/replace considers 2 lines, first line part of second line , replaces both lines second line. (that means, if have 3 duplicates: first replace leave second , third line standing , need replace all.)

Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -