Loading a tar file GZIP or BZIP2 into Hive table/Using CTAS/Like


 Loading a tar file GZIP or BZIP2 into Hive table/Using CTAS/Like(Hands on explanation)

------------------------------
Step 1: Set a few Hive properties

Set Hive.exec.compress.output = true
Set io.seqfile.compression.type = block

Step 2:- To zip a file to GZIP

tar -cvzf      /home/hadoop/Desktop/customer.tar.gz        /home/hadoop/Desktop/Customer.csv

Step3:
create table customer_gz
 (fn string, ln string, cat string, ph string, cid int,add array<string>)
 row format delimited
 fields terminated by ','
collection items terminated by ','
 stored as textfile;

step4: load data local inpath
'/home/hadoop/Desktop/customer.tar.gz'
into table customer_gz;

step5: create table customer_gz_seq 
stored as sequencefile as
 select * from customer_gz ;

step6: select * from customer_gz_seq ;

Hint: textfile of compression format Gzip or Bzip2 are not spittable on hadoop environment , so its not utilizing the parallel processing power of hadoop cluster. So it's better to  load it into sequence file table

-----
Just to copy a table definition without any data, create a table as shown below.

hive>create table customer_gz_seq_bckup LIKE customer_gz_seq;

hint: you cannot specify any more clauses in between LIKE and new table name mentioned. 




No comments:

Post a Comment