Pig Running in local mode and accessing file from hdfs/ storing in Hdfs

 Pig Running in local mode and accessing file from hdfs/ storing in Hdfs(problem explanation)

----------------------
Step 1:Running  Pig -x local mode  by defining a absolute path with a scheme and a authority  hadoop can still read distributed file system files  and results can be stored in hdfs.

salrec = load 'hdfs://localhost:8020/user/hadoop/Sales.csv' using PigStorage(',')
 AS
(custid:long, prod_id:long, qty_pur:int, pur_date:datetime, sale_id:long);

custrec = load 'hdfs://localhost:8020/user/hadoop/Customer.csv' using PigStorage(',')
AS
(fn:chararray, ln:chararray, status:chararray, ph:chararray, custid1:long, add:chararray);

pd_rec = load  'hdfs://localhost:8020/user/hadoop/Product.csv' using PigStorage(',')
AS (pdname:chararray, pddesc:chararray, pdcat:chararray, pdgty:long, pdid:long, pkwith:chararray);

crs_rec = join salrec by prod_id , pd_rec by pdid;

crs_rec1 = group crs_rec by custid,prod_id;

x = foreach crs_rec1 generate  group, flatten(crs_rec.(custid,sale_id,pdname,qty_pur,fn));

store x into 'hdfs://localhost:8020/user/hadoop/piglocaltohdfs2' USING PigStorage(',');

sample output in hdfs :
(7,98243),7,34842,09K Video,1,Allen
(11,77623),11,34843,J Case 1500,2,John
(19,88734),19,34857,DVD J INT,7,Hubert
(24,45641),24,34856,500 GB HD T,5,Roger

No comments:

Post a Comment