apache pig - JOIN two data set on the basis of string matching condition in Pig -
i new in pig , have 2 data sets, "highspender" , "feedback".
highspender:
price,fname,lname $50,jack,brown $30,rovin,pall
feedback:
date,name,rate 2015-01-02,jack b brown,5 2015-01-02,pall,4
now have join these 2 datasets on basis of name. condition should fname
or lname
of highspender should match name of feedback. how join these 2 datasets? idea?
you can try below script same need replace names according data
highs = load 'highs' using pigstorage(',') (price:chararray,fname:chararray,lname:chararray); feedback = load 'feeds' using pigstorage(',') (date:chararray,name:chararray,rate:chararray); out = join highs fname, feedback name; out1 = join highs lname, feedback name; final_out = union out,out1;
for further can refer pig reference manual
edit
as per comment script joining data string function bellow:
highs = load 'highs' using pigstorage(',') (price:chararray,fname:chararray,lname:chararray); feedback = load 'feeds' using pigstorage(',') (date:chararray,name:chararray,rate:chararray); crossout = cross highs, feedback; final_lname = filter crossout ( replace (feedback::name,highs::lname ,'') != feedback::name); final_fname = filter crossout ( replace (feedback::name,highs::fname ,'') != feedback::name); final = union final_lname, final_fname;
Comments
Post a Comment