A Join in Hive between three tables(hands on explanation)
----------------------------------Step 1:
hive> describe sales;
OK
cust_id int
prod_no int
qty int
datestring string
sales_id int
Time taken: 0.104 seconds, Fetched: 5 row(s)
hive> describe customer_temp;
FAILED: SemanticException [Error 10001]: Table not found customer_temp
hive> describe customers_temp;
OK
first_name string
last_name string
category string
phone_number string
customer_id int
address array<string>
Time taken: 0.097 seconds, Fetched: 6 row(s)
hive> describe salescust_proddet;
OK
cid int
sid int
pno bigint
Time taken: 0.138 seconds, Fetched: 3 row(s)
Step2:
hive>select /*+streamtable(spd)*/ s.*, c.*, spd.* from sales s full outer join customers_temp c on (s.cust_id = c.customer_id) left outer join salescust_proddet spd on (spd.cid =c.customer_id);
Hint:88 rows with Nulls,Sample output(64 92387 4 2012-01-24 34847 Mello Reynolds A 905-245-4431 64 ["Mediumcity","67890"] 64 34826 98243
64 92387 4 2012-01-24 34847 Mello Reynolds A 905-245-4431 64 ["Mediumcity","67890"] 64 34833 77624
NULL NULL NULL NULL NULL Elizabeth Metzer A 416-322-9001 85 ["Bigcity","12345"] NULL NULL NULL
NULL NULL NULL NULL NULL Jovan Melcic A 905-324-4456 92 ["Mediumcity","67890"] NULL NULL NULL
NULL NULL NULL NULL NULL Erick Jansen A 905-498-8211 93 ["Mediumcity","67890"] NULL NULL NULL)
Step3:
hive>select /*+streamtable(spd)*/ s.*, c.*, spd.* from sales s full outer join customers_temp c on (s.cust_id = c.customer_id) right outer join salescust_proddet spd on (spd.cid =c.customer_id);
Hint:64 rows without nulls
No comments:
Post a Comment