Hive Insert For New Records From Staging Table Into a Existing Hive Table.(For Scdtype-2)(hands on explanation)
-----------------------------------------------Step 1:
In scd-type 2 implementation using date range table , to insert a new record from a end of the day flat file ,into a existing hive table without overwrite . Load the entire end of day flat file into stage table and then to existing hive table with comparisions between them.
hive>
create table employee_stage
(fn string , ln string, current_workingcity string, previously_workedcity string, start_date Array<string>,end_date Array<string>, employee_id bigint)
row format delimited
fields terminated by ','
collection items terminated by '/'
stored as textfile;
hive>load data inpath
'/user/hadoop/employee_day12march2014'
into table employee_stage;
The hive supports only equi joins .If a table_stage table has 100 new records out of 1000 records that needs to be inserted into existing hive table "employee_history" (which is scdtype-2 with date range) without any errors execute
hive> insert overwrite directory
'/user/hadoop/newinsert=day1march2014'
select c.employee_id from employee_stage c where c.employee_id
NOT IN
(select e.employee_id from employee_history e);
Note: so you have a flat file with records of new employee_id's that been inserted into employee_history
hive>
insert into table employee_history
select c.* from employee_stage c where c.employee_id
NOT IN
(select e.employee_id from employee_history e);
hint: make sure the employee_id column in employee_stage table in not null.
No comments:
Post a Comment