apache pig - Pig: How do I add a new value for a column for a subset of data? -


i have data follows:

patient_id lab_value 1, 10 1, 3 2, 1 2, 4 3, 5 3, 10 3, 2 

what i'd find max lab_value per patient_id grouping calculate difference between lab_value , max follows.

patient_id lab_value lab_diff 1, 10, 0 1, 3, 7 (10 - 3) 2, 1, 3  2, 4, 0 3, 5, 5 (10 - 5) 3, 10, 0 3, 2, 8 (10 - 2) 

how this?

steps

  • load data
  • group id
  • get max lab value each id
  • distinct each group max value
  • join data max lab value based on id
  • generate diff value max value - lab value

pig script

a = load 'test1.txt' using pigstorage(',') (id:int, lab_value:int); b = group id; c = foreach b generate group id,max(a.lab_value) max_value; c1= distinct c; d = join id,c1 id; e = foreach d generate a::id,a::lab_value,(c1::max_value - a::lab_value) diff_value; dump e; 

result

result


Comments