Python: how to slice a csv file with respect to a column other than the first? -


i have csv file displays number of columns , 500000 rows. need slice file respect second column, displays year, maintaining other columns:

col1   col2   col3   col4   col5   col6   col7 xxx    1986   xxx    xxx    xxx    xxx    xxx xxx    1992   xxx    xxx    xxx    xxx    xxx xxx    1998   xxx    xxx    xxx    xxx    xxx ...    ...    ...    ...    ...    ...    ... xxx    2015   xxx    xxx    xxx    xxx    xxx xxx    1984   xxx    xxx    xxx    xxx    xxx 

my question: how can produce csv file out of this, values in second column >=1992?

desired output:

col1   col2   col3   col4   col5   col6   col7 xxx    1992   xxx    xxx    xxx    xxx    xxx xxx    1998   xxx    xxx    xxx    xxx    xxx xxx    2015   xxx    xxx    xxx    xxx    xxx 

my attempt this, got stuck @ point should insert if linked second column, don't know how that:

from __future__ import division import numpy numpy import * import csv collections import * import os import glob  directorypath=raw_input('working directory: ') #indicates csv file located i,file in enumerate(os.listdir(directorypath)): #loops on folder csv files     if file.endswith(".csv"): #checks if csv files         filename=os.path.basename(file) #takes complete path file         filelabel=file #takes filename         strpath = os.path.join(directorypath, file) #retrieves complete path find csv file         x=numpy.genfromtxt(strpath, delimiter=',')[:,7] #i got stuck here 

you can iterate on rows of csv see if value in col2 >= year interested in. if is, add row new list. pass in new list csv writer. can call function in loop create new csvs files ending csv extension.

you have pass in working_directory , year. folder of csvs want process.

import csv import os def make_csv(in_file, out_file, year):     open(in_file, 'rb') csv_in_file:         csv_row_list = []         first_row = true         csv_reader = csv.reader(csv_in_file)         row in csv_reader:             if first_row:                 csv_row_list.append(row)                 first_row = false             else:                 if int(row[1]) >= year:                     csv_row_list.append(row)      open(out_file, 'wb') csv_out_file:         csv_writer = csv.writer(csv_out_file)         csv_writer.writerows(csv_row_list)  root, directories, files in os.walk(working_directory):     f in files:         if f.endswith('.csv'):             in_file = os.path.join(root, f)             out_file = os.path.join(root, os.path.splitext(f)[0] + '_new' + os.path.splitext(f)[1])             make_csv(in_file, out_file, year) 

Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -