Hive Insert For New Records From Staging Table Into a Existing Hive Table.(For Scdtype-2)

 Hive Insert For New Records From Staging Table Into a Existing Hive Table.(For Scdtype-2)(hands on explanation)

-----------------------------------------------
Step 1:
In scd-type 2 implementation using date range table , to insert a new record from a  end of the day flat file ,into a existing hive table without overwrite . Load  the entire end of day flat file into stage table and then to existing hive table with comparisions between them.

hive>
create table employee_stage
(fn string , ln string, current_workingcity string, previously_workedcity string, start_date Array<string>,end_date  Array<string>, employee_id bigint)
 row format delimited
 fields terminated by ','
collection items terminated by '/'
 stored as textfile;

hive>load data inpath
'/user/hadoop/employee_day12march2014'
 into table employee_stage;

The hive supports only equi joins .If a table_stage table has 100 new records out of 1000 records that needs to be inserted into existing hive table "employee_history" (which is scdtype-2 with date range) without any errors execute

hive> insert overwrite  directory
'/user/hadoop/newinsert=day1march2014'
select c.employee_id from employee_stage c where c.employee_id
 NOT IN
(select e.employee_id from employee_history e);
Note: so you have a flat file with records of new employee_id's  that been inserted into employee_history

hive>
insert into table employee_history
 select c.* from employee_stage c where c.employee_id 
NOT IN
 (select e.employee_id from employee_history e); 
hint: make sure the employee_id column in employee_stage table in not null.

No comments:

Post a Comment