Pig Join using Merge clause

 Pig Join using Merge clause(hands on example explanation)

--------------------------------------
salrec = load '/home/horton/Desktop/Sales.csv' using PigStorage(',')
 AS
(custid:long, prod_id:long, qty_pur:int, pur_date:datetime, sale_id:long);

custrec = load '/home/horton/Desktop/Customer.csv' using PigStorage(',')
 AS
(fn:chararray, ln:chararray, status:chararray, ph:chararray, custid1:long, add:chararray);

pd_rec = load  '/home/horton/Desktop/Product.csv' using PigStorage(',')
 AS
 (pdname:chararray, pddesc:chararray, pdcat:chararray, pdgty:long, pdid:long, pkwith:chararray);



slrec_pdrec = join salrec by prod_id full, pd_rec by pdid ;

slrec_pdrec1 = order slrec_pdrec by custid;

salrec1 = ORDER salrec by custid;
custrec1 = ORDER custrec by custid1;

crs_rec = join custrec1 by custid1,slrec_pdrec1 by custid USING 'merge';

STORE  crs_rec into '/home/horton/Desktop/pigout17-orderby-joinfullmerge' using PigStorage();


:only merge supports two way joins
:Without order clause on every relation before it doesn't work
:cannot specify left outer, right outer, full in the statement where merge clause is specified

In merge right input will be sampled to create an index in the first mapreduce job where the second mapreduce job is initiated with left relation as the input

No comments:

Post a Comment