Loading a Patitioned Hive Table with Cluster clause, sort

 Loading a Patitioned Hive Table with Cluster clause, sort(hands on explanation)

--------------------------------------------
 Step 1 : create a stage table to load data into table  directly from storage area.
hive> create table product_stage (prod_name string, description varchar(20),category string,qty_onhand int, prod_num bigint, packaged_with Array<string> )
 row format delimited
 fields terminated by ','
 collection items terminated by ':' 
stored as textfile;

step 2:create a partitioned table
>create table product_part (prod_name string, description varchar(20),qty_onhand int, prod_num bigint, packaged_with Array<string>)
partitioned by (category string)
 row format delimited
 fields terminated by ','
collection items terminated by ':'
 stored as textfile;

step 3: set hive.exec.dynamic.partition.mode=nonstrict (to dynamically load partition data without specifying it statically like :partition category='ram')
when loading partition dynamically the column should be last specified in the select clause.

step4:to load data
hive>from product_stage 
insert overwrite table product_part partition (category) 
select  prod_name, description,qty_onhand,prod_num, packaged_with,category
Hint:one part file for each partition

Additional :
hive>create table product_pcsb (prod_name string, description varchar(20),qty_onhand int, prod_num bigint, packaged_with Array<string>)
partitioned by (category string)
 CLUSTERED BY(prod_num) SORTED BY(qty_onhand ASC) into 2 buckets
 row format delimited
 fields terminated by ','
collection items terminated by ':'
 stored as textfile;
Hint: two part files will be produced for each partition with sorted on qty_onhand in each part file.

hive>create table product_pc3b (prod_name string, description varchar(20),qty_onhand int, prod_num bigint, packaged_with Array<string>)
 partitioned by (category string)
 CLUSTERED BY(prod_num) into 3 buckets
 row format delimited
 fields terminated by ','
collection items terminated by ':'
stored as textfile;
Hint:Three part file will be produced for each partition

Data file
PROD_NAME    DESCRIPTION    CATEGORY    QTY_ON_HAND    PROD_NUM    PACKAGED_WITH
2 GB Memory E    2 GB Memory ECC    Ram    3000    87655    manual:heatsink
4 GB Memory E    4 GB Memory ECC    Ram    1000    87659    manual:heatsink
16 GB Memory E    16 GB Memory ECC    Ram    238    87634    manual
500 GB HD J    500 GB HD Panther Brand    HD    200    45628    atacable:manual
500 GB HD T    500 GB HD Tiger Brand    HD    498    45641    satacable:manual
1 TB HD J    1 TB HD Jargon Brand    HD    231    45691   
4 Core CPU J3    4 Core CPU Jargon Brand 3 GHZ     CPU    50    98820    thermalpaste:heatsink:manual
2 Core CPU J2    2 Core CPU Jargon Brand 2  GHZ     CPU    118    98838    thermalpaste:heatsink
1 Core CPU J2    1 Core CPU Jargon Brand 2  GHZ     CPU    203    98792    thermalpaste:heatsink
94F991 MB    Motherboard F991 CPU    MB    19    282299   
94G822 MB    Motherboard G822 CPU    MB    30    282109   
93H772 MB    Motherboard H772 CPU    MB    15    282009    cables:screws
93G Video    Video Card Jargon Brand 93G    Video    80    99202    dvd:manual:game
84G1 Video    Video Card Jargon Brand 84F1    Video    14    99207    dvd:manual:hdmicable
09K Video    Video Card Tiger Brand 84F1    Video    5    98243    manual:game
J Case 1500    Computer Case Jargon Brand Style 1500    Case    20    77623    fans:manual:screws
J Case 1501    Computer Case Jargon Brand Style 1501    Case    18    77624    fans:manual:screws
T Case 4332    Computer Case Tiger Brand Style 4332    Case    7    88211    fans:manual:screws:watercooler
J Power 300W    Power Supply Jargon Brand 300 Watts    Power    28    92387    cables:screws
J Power 500W    Power Supply Jargon Brand 500 Watts    Power    17    92373    cables:screws
T Power 300W    Power Supply Tiger Brand 300 Watts    Power    8    93347    cables:screws
DVD J INT    DVD Jargon Brand Internal    Optical    23    88734    manual
DVD J EXT    DVD Jargon Brand External    Optical    45    88821   
DVD T INT    DVD Tiger Brand Internal    Optical    19    82331    satacable:manual
DVD T EXT    DVD Tiger Brand External    Optical    17    82337    satacable:manual

the below statement Still works but result is similar to a regular load with one part file  (clustered by , sorted by clauses does not do any)
hive>create table product_cls (prod_name string, description varchar(20),qty_onhand int, prod_num bigint, packaged_with Array<string>)
 CLUSTERED BY(prod_num) SORTED BY(qty_onhand ASC) into 5 buckets
row format delimited
 fields terminated by ','
collection items terminated by ':'
stored as textfile;

hive>from product
insert overwrite table product_cls 
select  prod_name, description,qty_onhand,prod_num, packaged_with

Errors:Without INTO BUCKETS  clause followed to CLUSTERED BY < column_name> a error will be generated.

No comments:

Post a Comment