python - group by same partial string of pandas dataframe column -


i have several csv files , each 1 contains 1 stock price in 1 month , has millions of data. raw csv data data like:

aa_candy.csv

index   companyname      time       price   1      aa candy    030101090355   1.78   2      aa candy    030101091533   1.79   ....... 333498   aa candy    031231145556   2.18 

bb_cookie.csv

   1     bb cookie   030101090225   3.20    2     bb cookie   030101090845   3.14   ....... 391373   bb cookie   031231145958   3.88 

i use python , pandas process data, after load , combine of datafiles, have dataframe like:

frame:

index   companyname      time       price   1      aa candy    030101090355   1.78   2      aa candy    030101091533   1.79   ....... 333498   aa candy    031231145556   2.18 333499   bb cookie   030101090225   3.20 333500   bb cookie   030101090845   3.14   ....... 712871   bb cookie   031231145958   3.88 

the time 031231145958 represent 2013-12-31 14:59:58

now want highest price , final price in every 1 hour of each company, , output file like:

range_start   aa candy/max    aa candy/close    bb cookie/max     bb cookie/close 0301010900     1.79              1.77            3.20              3.10 ........ 0312311400     2.24              2.18            3.88              3.88 

therefore want groupby companyname , first 8 character of time data of same company in 1 hour, calculation find max price value , final price value of each company , output outcome same start hour in 1 row; let companyname/max or close new column name.

because new in pandas , dataframe, have following questions:

  1. how group data first 8 character of time column(object) , expected value?
  2. how form new output dataframe/matrix expected output?

thanks!!

perform groupby on company name , first 8 characters of string timestamp (i.e. date plus hour). use agg on price custom functions each (first, max, min , last). unstack company names, swap levels of company names , open/high/low/close , optionally sort symbols.

gb = (df.groupby(['companyname', df.time.str[:8]])         .price         .agg({'open': 'first',                'high': np.max,                'low': np.min,                'close': 'last'})[['open', 'high', 'low', 'close']]         .unstack('companyname')) gb.columns = gb.columns.swaplevel(0, 1) >>> gb.sortlevel(level=0, axis=1) companyname aa candy                   bb cookie                                   open  high   low close      open  high   low close time                                                               03010109        1.78  1.79  1.78  1.79      3.20  3.20  3.14  3.14 03123114        2.18  2.18  2.18  2.18      3.88  3.88  3.88  3.88 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -