Hive中自定义Map/Reduce示例 In Java
Hive支持自定义map与reduce script。接下来我用一个简单的wordcount例子加以说明。
如果自己使用Java开发,需要处理System.in,System,out以及key/value的各种逻辑,比较麻烦。有人开发了一个小框架,可以让我们使用与Hadoop中map与reduce相似的写法,只关注map与reduce即可。如今此框架已经集成在Hive中,就是$HIVE_HOME/lib/hive-contrib-2.3.0.jar,hive版本不同,对应的contrib名字可能不同。 开发工具:intellij
JDK:jdk1.7
hive:2.3.0
hadoop:2.8.1
一、开发map与reduce“map类 public class WordCountMap { static void main(String args[]) throws Exception{ new GenericMR().map(System.in,System.out,new Mapper() { @Override void map(String[] strings,Output output) Exception { for(String str:strings){ String[] strs=str.split("W+");//如果源文本文件是以t分隔的,则不需要再拆分,传入的strings就是每行拆分好的单词 (String str_2:strs) { output.collect(new String[]{str_2,"1"}); } } } }); } } "reduce类 WordCountReducer { new GenericMR().reduce(System.in,1)"> Reducer() { @Override void reduce(String s,Iterator<String[]> iterator,1)">int sum=0; while(iterator.hasNext()){ Integer count=Integer.valueOf(iterator.next()[1]); sum+=count; } output.collect( String[]{s,String.valueOf(sum)}); } }); } } ? 二、导出jar包然后导出Jar包(包含hive-contrib-2.3.0),假如导出jar包名为wordcount.jar ?? ?? ? ?三、编写hive sqldrop table if exists raw_lines; -- create table raw_line,and read all the lines in '/user/inputs',this is the path on your local HDFS create external table if not exists raw_lines(line string) ROW FORMAT DELIMITED stored as textfile location ; drop table exists word_count; -- create table word_count,this is the output table which will be put /user/outputs' as a text fileif not exists word_count(word string,count int) ROW FORMAT DELIMITED FIELDS TERMINATED BY t lines terminated by n' STORED AS TEXTFILE LOCATION /user/outputs/; -- add the mapper&reducer scripts as resources,please change your/local/path --must use "add file",not add jart find map and reduce main class add file your/local/path/wordcount.jar; from ( from raw_lines map raw_lines.line --call the mapper here using java -cp wordcount.jar WordCountMap as word,count cluster by word) map_output insert overwrite table word_count reduce map_output.word,map_output.count --call the reducer here using java -cp wordcount.jar WordCountReducer as word,count; 此hive sql保存为wordcount.hql ? 四、执行hive sqlbeeline -u [hiveserver] -n username -f wordcount.hql ? 简单说下Hive的自定义map与reduce内部原理: (编辑:北几岛) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |