performance - Python - Exclude contents of one file from another / removing duplicate lines amongst two files -
first off, i'm using python 2.7.9 ..... now, i'm trying find efficient way compare lines of 1 text file (file a) lines of text file (file b) , write lines unique file new file (file a\b).
actually i've written short script this, beyond slow... need script able handle files of 70mb(each, a&b), unthinkable 'bad' boy:
import string naked = string.strip kiss = ''.join def main(): list1 = raw_input("enter name of .txt-file clean!\n") list2 = raw_input("enter name of .txt-file exclude!\n") action(list1, list2) raw_input("done!\npress [enter] exit!") def action(list1, list2): f = open(kiss([list1, '.txt']), "r") g = open(kiss([list2, '.txt']), "r") h = open(kiss([list1, '_without_', list2, '.txt']), "w") h_w = h.write reset = g.seek found = false in f: found = [true j in g if naked(i) == naked(j)] if not found: h_w(kiss([naked(i), '\n'])) else: found = false reset(0) f.close() g.close() h.close() main()
yeah... have idea how more efficiently?! in advance!
def read_file(filename): open(filename) src: return [line.strip() line in src.readlines()] def main(): list1 = raw_input("enter name of .txt-file clean!\n") list2 = raw_input("enter name of .txt-file exclude!\n") file1 = read_file(list1) file2 = read_file(list2) file3 = open('new_file.txt', 'w') line in file1: if line not in file2: file3.write(str(line) + '\n') # writes new file file3.close() print 'completed' main()
i not sure fastest way trick. can use "diff" or "comm" linux commands required output.
Comments
Post a Comment