Change information in a CSV file using info from the first one in python -
i'm trying edit csv file using informations first one. doesn't seem simple me should filter multiple things. let's explain problem.
i have 2 csv files, let's patch.csv , origin.csv. output csv file should have same pattern origin.csv, corrected values.
i want replace trip_headsign column fields in origin.csv using forward_line_name column in patch.csv if direction_id field in origin.csv row 0, or using backward_line_name if direction_id 1.
i want if part of line_id value in patch.csv between ":" , ":" symbols same part of route_id value in origin.csv before ":" symbol.
i know how replace whole line, not parts, have part of value.
here sample of origin.csv:
route_id,service_id,trip_id,trip_headsign,direction_id,block_id 210210109:001,2913,70405957139549,70405957,0, 210210109:001,2916,70405961139553,70405961,1, and sample of patch.csv:
line_id,line_code,line_name,forward_line_name,forward_direction,backward_line_name,backward_direction,line_color,line_sort,network_id,commercial_mode_id,contributor_id,geometry_id,line_opening_time,line_closing_time oif:100110010:10oif439,10,boulogne pont de saint-cloud - gare d'austerlitz,boulogne / pont de st cloud - gare d'austerlitz,oif:sa:8754700,gare d'austerlitz - boulogne / pont de st cloud,oif:sa:59400,dfb039,91,oif:439,metro,oif,geometry:line:100110010:10,05:30:00,25:47:00 oif:210210109:001oif30,001,ffourches longueville provins,place mérot - gare de longueville,,gare de longueville - place mérot,oif:sa:63:49,000000 1,oif:30,bus,oif,,05:39:00,19:50:00 each file has hundred of lines need parse , edit way.
based on mhopeng answer, obtained code:
#!/usr/bin/env python2 __future__ import print_function import fileinput import sys # first route info patch.csv f = open(sys.argv[1]) d = open(sys.argv[2]) # ignore header line #line1 = f.readline() #line2 = d.readline() # line of data line1 in f.readline(): line1 = f.readline().split(',') route_id = line1[0].split(':')[1] # '210210109' route_forward = line1[3] route_backward = line1[5] line_code = line1[1] # process origin.csv , replace lines in-place line in fileinput.input(sys.argv[2], inplace=1): line2 = d.readline().split(',') num_route = line2[0].split(':')[0] # prevent lines same route_id different code considered same line if line.startswith(route_id) , (num_route == line_code): if line.startswith(route_id): newline = line.split(',') if newline[4] == 0: newline[3] = route_backward else: newline[3] = route_forward print('\t'.join(newline),end="") else: print(line,end="") but unfortunately, doesn't push right forward or backward_line_name in trip_headsign (always forward), , triggers error, before finishing parsing file:
traceback (most recent call last): file "./gtfs_enhancer_headsigns.py", line 28, in if newline[4] == 0: indexerror: list index out of range
thanks on this.
pandas convenient handling csv files. use this:
import pandas pd origin = pd.read_csv('origin.csv',index_col=none) patch = pd.read_csv('patch.csv', index_col=none) # create match_keys matching origin.csv patch.line_id patch['match_key'] = [x.split(':')[1] x in patch.line_id.values] origin['match_key'] = [x.split(':')[0] x in origin.route_id.values] i,key in enumerate(origin.match_key.values): p = patch[patch.match_key == key] if len(p) == 1: if (origin.direction_id[i] == 0): origin.trip_headsign[i] = p.forward_line_name.values[0] elif (origin.direction_id[i] == 1): origin.trip_headsign[i] = p.backward_line_name.values[0] origin.to_csv('new_origin.csv',index=false)
Comments
Post a Comment