apache pig - Pig: How do I add a new value for a column for a subset of data? -
i have data follows:
patient_id lab_value 1, 10 1, 3 2, 1 2, 4 3, 5 3, 10 3, 2 what i'd find max lab_value per patient_id grouping calculate difference between lab_value , max follows.
patient_id lab_value lab_diff 1, 10, 0 1, 3, 7 (10 - 3) 2, 1, 3 2, 4, 0 3, 5, 5 (10 - 5) 3, 10, 0 3, 2, 8 (10 - 2) how this?
steps
- load data
- group id
- get max lab value each id
- distinct each group max value
- join data max lab value based on id
- generate diff value max value - lab value
pig script
a = load 'test1.txt' using pigstorage(',') (id:int, lab_value:int); b = group id; c = foreach b generate group id,max(a.lab_value) max_value; c1= distinct c; d = join id,c1 id; e = foreach d generate a::id,a::lab_value,(c1::max_value - a::lab_value) diff_value; dump e; result

Comments
Post a Comment