Pig Join Using skewed

 Pig Join Using skewed with datasets(hands on explanation)

--------------------------
salrec = load '/home/horton/Desktop/Sales.csv' using PigStorage(',')
AS
(custid:long, prod_id:long, qty_pur:int, pur_date:datetime, sale_id:long);

custrec = load '/home/horton/Desktop/Customer.csv' using PigStorage(',')
 AS
(fn:chararray, ln:chararray, status:chararray, ph:chararray, custid1:long, add:chararray);

pd_rec = load  '/home/horton/Desktop/Product.csv' using PigStorage(',')
AS
(pdname:chararray, pddesc:chararray, pdcat:chararray, pdgty:long, pdid:long, pkwith:chararray);

slrec_pdrec = join salrec by prod_id full, pd_rec by pdid ;

crs_rec = join custrec by custid1 full,slrec_pdrec by custid USING 'skewed';


STORE  crs_rec into '/home/horton/Desktop/pigc25' using PigStorage();

Note : left one histogram , right one is streamed ,Used when input data is too large to fit in memory

To keep all the records use full . You can implement left outer, right outer, full using skewed. 

skewed join can only be applied for 2-way joins

No comments:

Post a Comment