Reading Xml file into pig .
----------------------This should a pig script file with a name for example.
xmread.pig
register /home/hadoop/Desktop/loader-forum.jar
pigdata = load '/home/hadoop/Desktop/pig/Pig_Practicals/xml1.xml' using XMLLoader('name') as (doc:chararray);
store pigdata into '/home/hadoop/Desktop/pigoutputvalues1';
values = foreach pigdata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<name>(.*)</name>')) AS (name:chararray);
store values into '/home/hadoop/Desktop/pigoutputvalues';
------------------------------------------------------------------------------------------------------------------
--register '/usr/lib/pig-0.12.0/contrib/piggybank/java/piggybank.jar'
register /home/hadoop/Desktop/loader-forum.jar
pigdata = load '/home/hadoop/Desktop/Pig_Practicals/xml2.xml' USING XMLLoader('Property') as (doc:chararray);
store pigdata into '/home/hadoop/Desktop/pigoutputvalues1';
values = foreach pigdata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<Property>\\s*<fname>(.*)</fname>\\s*<lname>(.*)</lname>\\s*<landmark>(.*)</landmark>\\s*<city>(.*)</city>\\s*<state>(.*)</state>\\s*<contact>(.*)</contact>\\s*<email>(.*)</email>\\s*<PAN_Card>(.*)</PAN_Card>\\s*<URL>(.*)</URL>\\s*</Property>')) AS (fname:chararray, lname:chararray, landmark:chararray, city:chararray, state:chararray, contact:int, email:chararray, PAN_Card:long, URL:chararray);
store values into '/home/hadoop/Desktop/pigoutputvalues';
__________________________
so the script pig file is on desktop named xmread.pig
The xml1.xml and xml2.xml data files are in path /home/hadoop/Desktop/pig/Pig_Practicals/
____________________________-
go to $PIG_HOME/bin directory
$PIG_HOME/bin>./pig -x local /home/hadoop/Desktop/xmread.pig
______________________________________
The data is written in path /home/hadoop/Desktop/pigoutputvalues
Thanks for sharing this informative information ...You amy also refer....Forum of question and answers of hadoop..........check it for more Quetions and answers....
ReplyDeletehttp://www.s4techno.com/forum/viewforum.php?f=5&sid=64be5ee03200a1828e654009e3e7d3fc