python - Efficiently finding overlap between many date ranges -
how can efficiently find overlapping dates between many date ranges?
i have pandas dataframe containing information on daily warehouse stock of many products. there records dates stock changed.
import pandas pd df = pd.dataframe({'product': ['a', 'a', 'a', 'b', 'b', 'b'], 'stock': [10, 0, 10, 5, 0, 5], 'date': ['2016-01-01', '2016-01-05', '2016-01-15', '2016-01-01', '2016-01-10', '2016-01-20']}) df['date'] = pd.to_datetime(df['date']) out[4]: date product stock 0 2016-01-01 10 1 2016-01-05 0 2 2016-01-15 10 3 2016-01-01 b 5 4 2016-01-10 b 0 5 2016-01-20 b 5
from data want identify number of days stock of all products 0. in example 5 days (from 2016-01-10 2016-01-14).
i tried resampling date create 1 record every day , comparing day day. works creates large dataframe, can hardly keep in memory, because data contains many dates stock not change.
is there more memory-efficient way calculate overlaps other creating record every date , comparing day day?
maybe can somehow create period representation time range implicit in every records , compare periods products? option first subset time periods product has 0 stock (relatively few) , apply resampling on subset of data. other, more efficient ways there?
you can pivot table using dates index , products columns, fill nan's previous values, convert daily frequency , rows 0's in columns.
ptable = (df.pivot(index='date', columns='product', values='stock') .fillna(method='ffill').asfreq('d', method='ffill')) cond = ptable.apply(lambda x: (x == 0).all(), axis='columns') print(ptable.index[cond]) datetimeindex(['2016-01-10', '2016-01-11', '2016-01-12', '2016-01-13', '2016-01-14'], dtype='datetime64[ns]', name=u'date', freq='d')
Comments
Post a Comment