Pig Running in local mode and accessing file from hdfs/ storing in Hdfs(problem explanation)
----------------------Step 1:Running Pig -x local mode by defining a absolute path with a scheme and a authority hadoop can still read distributed file system files and results can be stored in hdfs.
salrec = load 'hdfs://localhost:8020/user/hadoop/Sales.csv' using PigStorage(',')
AS
(custid:long, prod_id:long, qty_pur:int, pur_date:datetime, sale_id:long);
custrec = load 'hdfs://localhost:8020/user/hadoop/Customer.csv' using PigStorage(',')
AS
(fn:chararray, ln:chararray, status:chararray, ph:chararray, custid1:long, add:chararray);
pd_rec = load 'hdfs://localhost:8020/user/hadoop/Product.csv' using PigStorage(',')
AS (pdname:chararray, pddesc:chararray, pdcat:chararray, pdgty:long, pdid:long, pkwith:chararray);
crs_rec = join salrec by prod_id , pd_rec by pdid;
crs_rec1 = group crs_rec by custid,prod_id;
x = foreach crs_rec1 generate group, flatten(crs_rec.(custid,sale_id,pdname,qty_pur,fn));
store x into 'hdfs://localhost:8020/user/hadoop/piglocaltohdfs2' USING PigStorage(',');
sample output in hdfs :
(7,98243),7,34842,09K Video,1,Allen
(11,77623),11,34843,J Case 1500,2,John
(19,88734),19,34857,DVD J INT,7,Hubert
(24,45641),24,34856,500 GB HD T,5,Roger
No comments:
Post a Comment