Python: how to slice a csv file with respect to a column other than the first? -
i have csv file displays number of columns , 500000 rows. need slice file respect second column, displays year, maintaining other columns:
col1 col2 col3 col4 col5 col6 col7 xxx 1986 xxx xxx xxx xxx xxx xxx 1992 xxx xxx xxx xxx xxx xxx 1998 xxx xxx xxx xxx xxx ... ... ... ... ... ... ... xxx 2015 xxx xxx xxx xxx xxx xxx 1984 xxx xxx xxx xxx xxx my question: how can produce csv file out of this, values in second column >=1992?
desired output:
col1 col2 col3 col4 col5 col6 col7 xxx 1992 xxx xxx xxx xxx xxx xxx 1998 xxx xxx xxx xxx xxx xxx 2015 xxx xxx xxx xxx xxx my attempt this, got stuck @ point should insert if linked second column, don't know how that:
from __future__ import division import numpy numpy import * import csv collections import * import os import glob directorypath=raw_input('working directory: ') #indicates csv file located i,file in enumerate(os.listdir(directorypath)): #loops on folder csv files if file.endswith(".csv"): #checks if csv files filename=os.path.basename(file) #takes complete path file filelabel=file #takes filename strpath = os.path.join(directorypath, file) #retrieves complete path find csv file x=numpy.genfromtxt(strpath, delimiter=',')[:,7] #i got stuck here
you can iterate on rows of csv see if value in col2 >= year interested in. if is, add row new list. pass in new list csv writer. can call function in loop create new csvs files ending csv extension.
you have pass in working_directory , year. folder of csvs want process.
import csv import os def make_csv(in_file, out_file, year): open(in_file, 'rb') csv_in_file: csv_row_list = [] first_row = true csv_reader = csv.reader(csv_in_file) row in csv_reader: if first_row: csv_row_list.append(row) first_row = false else: if int(row[1]) >= year: csv_row_list.append(row) open(out_file, 'wb') csv_out_file: csv_writer = csv.writer(csv_out_file) csv_writer.writerows(csv_row_list) root, directories, files in os.walk(working_directory): f in files: if f.endswith('.csv'): in_file = os.path.join(root, f) out_file = os.path.join(root, os.path.splitext(f)[0] + '_new' + os.path.splitext(f)[1]) make_csv(in_file, out_file, year)
Comments
Post a Comment