A Join in Hive between three tables

 A Join in Hive between three tables(hands on explanation)

----------------------------------
 Step 1:
hive> describe sales;
OK
cust_id                 int                                        
prod_no                 int                                        
qty                     int                                        
datestring              string                                     
sales_id                int                                        
Time taken: 0.104 seconds, Fetched: 5 row(s)
hive> describe customer_temp;
FAILED: SemanticException [Error 10001]: Table not found customer_temp
hive> describe customers_temp;
OK
first_name              string                                     
last_name               string                                     
category                string                                     
phone_number            string                                     
customer_id             int                                        
address                 array<string>                              
Time taken: 0.097 seconds, Fetched: 6 row(s)
hive> describe salescust_proddet;
OK
cid                     int                                        
sid                     int                                        
pno                     bigint                                     
Time taken: 0.138 seconds, Fetched: 3 row(s)

Step2:
hive>select /*+streamtable(spd)*/ s.*, c.*, spd.* from sales s full outer join customers_temp c on (s.cust_id = c.customer_id) left outer join salescust_proddet spd on (spd.cid =c.customer_id);

Hint:88 rows with Nulls,Sample output(64    92387    4    2012-01-24    34847    Mello    Reynolds    A    905-245-4431    64    ["Mediumcity","67890"]    64    34826    98243
64    92387    4    2012-01-24    34847    Mello    Reynolds    A    905-245-4431    64    ["Mediumcity","67890"]    64    34833    77624
NULL    NULL    NULL    NULL    NULL    Elizabeth    Metzer    A    416-322-9001    85    ["Bigcity","12345"]    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    Jovan    Melcic    A    905-324-4456    92    ["Mediumcity","67890"]    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    Erick    Jansen    A    905-498-8211    93    ["Mediumcity","67890"]    NULL    NULL    NULL)


Step3:
hive>select /*+streamtable(spd)*/ s.*, c.*, spd.* from sales s full outer join customers_temp c on (s.cust_id = c.customer_id) right outer join salescust_proddet spd on (spd.cid =c.customer_id);

Hint:64 rows without nulls

No comments:

Post a Comment