Change information in a CSV file using info from the first one in python -


i'm trying edit csv file using informations first one. doesn't seem simple me should filter multiple things. let's explain problem.

i have 2 csv files, let's patch.csv , origin.csv. output csv file should have same pattern origin.csv, corrected values.

i want replace trip_headsign column fields in origin.csv using forward_line_name column in patch.csv if direction_id field in origin.csv row 0, or using backward_line_name if direction_id 1.

i want if part of line_id value in patch.csv between ":" , ":" symbols same part of route_id value in origin.csv before ":" symbol.

i know how replace whole line, not parts, have part of value.

here sample of origin.csv:

route_id,service_id,trip_id,trip_headsign,direction_id,block_id  210210109:001,2913,70405957139549,70405957,0, 210210109:001,2916,70405961139553,70405961,1, 

and sample of patch.csv:

line_id,line_code,line_name,forward_line_name,forward_direction,backward_line_name,backward_direction,line_color,line_sort,network_id,commercial_mode_id,contributor_id,geometry_id,line_opening_time,line_closing_time  oif:100110010:10oif439,10,boulogne pont de saint-cloud - gare d'austerlitz,boulogne / pont de st cloud - gare d'austerlitz,oif:sa:8754700,gare d'austerlitz - boulogne / pont de st cloud,oif:sa:59400,dfb039,91,oif:439,metro,oif,geometry:line:100110010:10,05:30:00,25:47:00 oif:210210109:001oif30,001,ffourches longueville provins,place mérot - gare de longueville,,gare de longueville - place mérot,oif:sa:63:49,000000   1,oif:30,bus,oif,,05:39:00,19:50:00 

each file has hundred of lines need parse , edit way.

based on mhopeng answer, obtained code:

#!/usr/bin/env python2 __future__ import print_function import fileinput import sys  # first route info patch.csv f = open(sys.argv[1]) d = open(sys.argv[2]) # ignore header line #line1 = f.readline() #line2 = d.readline() # line of data line1 in f.readline():     line1 = f.readline().split(',')     route_id = line1[0].split(':')[1] # '210210109'     route_forward = line1[3]     route_backward = line1[5]     line_code = line1[1]  # process origin.csv , replace lines in-place     line in fileinput.input(sys.argv[2], inplace=1):         line2 = d.readline().split(',')         num_route = line2[0].split(':')[0] # prevent lines same route_id different code considered same line          if line.startswith(route_id) , (num_route == line_code):         if line.startswith(route_id):             newline = line.split(',')             if newline[4] == 0:                 newline[3] = route_backward             else:                 newline[3] = route_forward             print('\t'.join(newline),end="")         else:             print(line,end="") 

but unfortunately, doesn't push right forward or backward_line_name in trip_headsign (always forward), , triggers error, before finishing parsing file:

traceback (most recent call last): file "./gtfs_enhancer_headsigns.py", line 28, in if newline[4] == 0: indexerror: list index out of range

thanks on this.

pandas convenient handling csv files. use this:

import pandas pd  origin = pd.read_csv('origin.csv',index_col=none) patch  = pd.read_csv('patch.csv', index_col=none)  # create match_keys matching origin.csv patch.line_id  patch['match_key'] = [x.split(':')[1] x in patch.line_id.values] origin['match_key'] = [x.split(':')[0] x in origin.route_id.values]  i,key in enumerate(origin.match_key.values):     p = patch[patch.match_key == key]     if len(p) == 1:         if (origin.direction_id[i] == 0):             origin.trip_headsign[i] = p.forward_line_name.values[0]         elif (origin.direction_id[i] == 1):             origin.trip_headsign[i] = p.backward_line_name.values[0]  origin.to_csv('new_origin.csv',index=false) 

Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -