Pig Join Using skewed with datasets(hands on explanation)
--------------------------salrec = load '/home/horton/Desktop/Sales.csv' using PigStorage(',')
AS
(custid:long, prod_id:long, qty_pur:int, pur_date:datetime, sale_id:long);
custrec = load '/home/horton/Desktop/Customer.csv' using PigStorage(',')
AS
(fn:chararray, ln:chararray, status:chararray, ph:chararray, custid1:long, add:chararray);
pd_rec = load '/home/horton/Desktop/Product.csv' using PigStorage(',')
AS
(pdname:chararray, pddesc:chararray, pdcat:chararray, pdgty:long, pdid:long, pkwith:chararray);
slrec_pdrec = join salrec by prod_id full, pd_rec by pdid ;
crs_rec = join custrec by custid1 full,slrec_pdrec by custid USING 'skewed';
STORE crs_rec into '/home/horton/Desktop/pigc25' using PigStorage();
Note : left one histogram , right one is streamed ,Used when input data is too large to fit in memory
To keep all the records use full . You can implement left outer, right outer, full using skewed.
skewed join can only be applied for 2-way joins
No comments:
Post a Comment