Ctas With Three Table Joins In Two Different Databases In a Subquery(hands on explanation)
--------hive> describe default.pd_det;
OK
dept string
cost int
id int
asmb_city string
asmb_ct string
retail int
country string
# Partition Information
# col_name data_type comment
country string
------
hive> describe default.item_avgprofit_byctry ;
OK
avgprf int
makein string
itemshipped int
Time taken: 2.596 seconds, Fetched: 3 row(s)
----
hive> describe customersales.product_part ;
OK
prod_name string
description varchar(20)
qty_onhand int
prod_num bigint
packaged_with array<string>
category string
# Partition Information
# col_name data_type comment
category string
Time taken: 0.185 seconds, Fetched: 11 row(s)
-----
Step 2: A join between three tables
table 1:
default.pd_det a (join column :a.country)
table 2:
default.item_avgprofit_byctry (join cloumn:b.makein)
table3:
customersales.product_part c (join column: c.prod_num with a.id)
______________________
Step 3: CTAS
create table threejoin_ctastwodifferentdb stored as textfile as
select t3.cost,t3.pdid,t3.madein,t3.pdname from
(
select a.cost as cost, a.id as pdid, b.makein as madein, c.prod_name as pdname from default.pd_det a
join default.item_avgprofit_byctry b on (a.country = b.makein)
join customersales.product_part c on (a.id = c.prod_num)
)t3
------
hive> describe threejoin_ctastwodifferentdb;
OK
cost int
pdid int
madein string
pdname string
Time taken: 1.17 seconds, Fetched: 4 row(s)
CTAS has these restrictions:(which we created in the above example " threejoin_ctastwodifferentdb"cannot be
The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.
No comments:
Post a Comment