Based on the main point, how to import data into hbase?
In my recent practises, there are 4 common ways:
- To use hive-hbase integration which is supported in cdh 5+, it means that you can import data by using hql, easy ,right?
- To use bulk load jobs ‘hbase org.apache.hadoop.hbase.mapreduce.ImportTsv’, tsv format is necessary which is not convenient for me.
- To write map reduce jobs, it will take a little time which I think is good for your better understand of relations between hbase and hdfs.
- To use pig, which I will say in the following, in my recent experiments, it’s fast and easy to comprehend.
- Fisrt, prepare your own hdfs original files.
$ cat /tmp/nba.txt
- put the local files to hdfs
$ hadoop dfs -put /tmp/nba.txt /user/hadoop/nba
- create table in hbase
hbase> create 'hbase_nba', 'basic_informs'
- create the Load_HBase_Nba.pig
raw_data = LOAD 'hdfs:/user/hadoop/nba' USING PigStorage(',') AS ( number:chararray, name:chararray, score:chararray, ); STORE raw_data into 'hbase://hbase_nba' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'basic_informs:name basic_informs:score' );
- use pig to import data
$ /usr/bin/pig /home/training/Load_HBase_Nba.pig
Okay, it has already done and you can scan your hbase data, tomorrow, I’ll practise the performance, it is really exciting.
- To import parameters into pig jobs.
$ pig -param date=20080201 Load_HBase_alogs.pig
In pig jobs :