educoder平台MapReduce基础实战

合集下载

实验5-MapReduce实验：单词计数

实验五MapReduce实验：单词计数5.1 实验目的基于MapReduce思想，编写WordCount程序。

5.2 实验要求1．理解MapReduce编程思想；2．会编写MapReduce版本WordCount；3．会执行该程序；4．自行分析执行过程。

5.3 实验原理MapReduce是一种计算模型，简单的说就是将大批量的工作（数据）分解（MAP）执行，然后再将结果合并成最终结果（REDUCE）。

这样做的好处是可以在任务被分解后，可以通过大量机器进行并行计算，减少整个操作的时间。

适用范围：数据量大，但是数据种类小可以放入内存。

基本原理及要点：将数据交给不同的机器去处理，数据划分，结果归约。

理解MapReduce和Yarn：在新版Hadoop中，Yarn作为一个资源管理调度框架，是Hadoop下MapReduce程序运行的生存环境。

其实MapRuduce除了可以运行Yarn框架下，也可以运行在诸如Mesos，Corona之类的调度框架上，使用不同的调度框架，需要针对Hadoop做不同的适配。

一个完成的MapReduce程序在Yarn中执行过程如下：（1）ResourcManager JobClient向ResourcManager提交一个job。

（2）ResourcManager向Scheduler请求一个供MRAppMaster运行的container，然后启动它。

（3）MRAppMaster启动起来后向ResourcManager注册。

（4）ResourcManagerJobClient向ResourcManager获取到MRAppMaster相关的信息，然后直接与MRAppMaster进行通信。

（5）MRAppMaster算splits并为所有的map构造资源请求。

（6）MRAppMaster做一些必要的MR OutputCommitter的准备工作。

（7）MRAppMaster向RM(Scheduler)发起资源请求，得到一组供map/reduce task运行的container，然后与NodeManager一起对每一个container执行一些必要的任务，包括资源本地化等。

educoder平台MapReduce基础实战教案资料

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private int maxValue = 0;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
FileOutputFormat.setOutputPath(job, new Path(outputFile));
job.waitForCompletion(true);
/********** End **********/
}
}
命令行
touch file01
echo Hello World Bye World
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
/********** Begin **********/
public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

mapreduce基础编程

mapreduce基础编程MapReduce是一种用于大规模数据处理的编程模型和软件框架。

它可以将大数据集分成多个小数据集，并通过多个计算节点并行处理，最后汇总处理结果。

MapReduce将数据处理过程分成两个阶段：Map阶段和Reduce阶段。

在Map阶段中，数据被分成多个小数据集，每个小数据集上运行相同的计算任务，然后产生中间结果。

在Reduce阶段中，中间结果被合并，最终产生处理结果。

MapReduce的基础编程模型可以分为以下几个步骤：1. 输入数据的读取：输入数据可以来自于Hadoop Distributed File System (HDFS)、本地文件系统或其他数据源。

2. Map阶段的编写：开发者需要编写Map函数，该函数将输入数据切分成多个小数据集，并在每个小数据集上运行相同的计算任务，生成中间结果。

Map函数的输出通常是一个键值对（key-value pair），其中键表示中间结果的类型，值表示中间结果的值。

3. Reduce阶段的编写：开发者需要编写Reduce函数，该函数将中间结果根据键值进行合并，生成最终的处理结果。

Reduce函数的输出通常是一个键值对（key-value pair），其中键表示最终处理结果的类型，值表示最终处理结果的值。

4. 输出数据的写入：最终处理结果可以写入到HDFS或其他数据源中。

MapReduce程序的开发需要掌握Java或其他编程语言。

除了基础编程模型外，还需要了解MapReduce的一些高级编程技术，如Combiner、Partitioner、InputFormat、OutputFormat等。

通过这些技术，可以进一步提高MapReduce程序的性能和可扩展性。

总之，MapReduce是一种强大的大数据处理工具，掌握基础编程模型是进行大数据分析和处理的必要条件。

educoder平台MapReduce基础实战

MapReduce第1关：成绩统计过关代码：import java.io.IOException;import java.util.StringTokenizer;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {/********** Begin **********/public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();private int maxValue = 0;public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString(),"\n");while (itr.hasMoreTokens()) {String[] str = itr.nextToken().split(" ");String name = str[0];one.set(Integer.parseInt(str[1]));word.set(name);context.write(word,one);}//context.write(word,one);}}public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int maxAge = 0;int age = 0;for (IntWritable intWritable : values) {maxAge = Math.max(maxAge, intWritable.get());}result.set(maxAge);context.write(key, result);}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);String inputfile = "/user/test/input";String outputFile = "/user/test/output/";FileInputFormat.addInputPath(job, new Path(inputfile));FileOutputFormat.setOutputPath(job, new Path(outputFile));job.waitForCompletion(true);/********** End **********/}}命令行touch file01echo Hello World Bye Worldcat file01echo Hello World Bye World >file01cat file01touch file02echo Hello Hadoop Goodbye Hadoop >file02cat file02start-dfs.shhadoop fs -mkdir /usrhadoop fs -mkdir /usr/inputhadoop fs -ls /usr/outputhadoop fs -ls /hadoop fs -ls /usrhadoop fs -put file01 /usr/inputhadoop fs -put file02 /usr/inputhadoop fs -ls /usr/input测评——————————————————————————————————MapReduce第2关：文件容合并去重代码import java.io.IOException;import java.util.*;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class Merge {/*** param args* 对A,B两个文件进行合并，并剔除其中重复的容，得到一个新的输出文件C*///在这重载map函数，直接将输入中的value复制到输出数据的key上注意在map 方法中要抛出异常：throws IOException,InterruptedException/********** Begin **********/public static class Map extends Mapper<LongWritable, Text, Text, Text >{protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)throws IOException, InterruptedException {String str = value.toString();String[] data = str.split(" ");Text t1= new Text(data[0]);Text t2 = new Text(data[1]);context.write(t1,t2);}}/********** End **********///在这重载reduce函数，直接将输入中的key复制到输出数据的key上注意在reduce方法上要抛出异常：throws IOException,InterruptedException /********** Begin **********/public static class Reduce extends Reducer<Text, Text, Text, Text>{protected void reduce(Text key, Iterable<Text> values, Reducer<Text,Text, Text, Text>.Context context)throws IOException, InterruptedException {List<String> list = new ArrayList<>();for (Text text : values) {String str = text.toString();if(!list.contains(str)){list.add(str);}}Collections.sort(list);for (String text : list) {context.write(key, new Text(text));}}/********** End **********/}public static void main(String[] args) throws Exception{Configuration conf = new Configuration();Job job = new Job(conf, "word count");job.setJarByClass(Merge.class);job.setMapperClass(Map.class);job.setCombinerClass(Reduce.class);job.setReducerClass(Reduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);String inputPath = "/user/tmp/input/"; //在这里设置输入路径String outputPath = "/user/tmp/output/"; //在这里设置输出路径FileInputFormat.addInputPath(job, new Path(inputPath));FileOutputFormat.setOutputPath(job, new Path(outputPath));System.exit(job.waitForCompletion(true) ? 0 : 1);}}测评———————————————————————————————————————MapReduce第3关：信息挖掘- 挖掘父子关系代码import java.io.IOException;import java.util.*;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class simple_data_mining {public static int time = 0;/*** param args* 输入一个child-parent的表格* 输出一个体现grandchild-grandparent关系的表格*///Map将输入文件按照空格分割成child和parent，然后正序输出一次作为右表，反序输出一次作为左表，需要注意的是在输出的value中必须加上左右表区别标志public static class Map extends Mapper<Object, Text, Text, Text>{public void map(Object key, Text value, Context context) throws IOException,InterruptedException{/********** Begin **********/String child_name = new String();String parent_name = new String();String relation_type = new String();String line = value.toString();int i = 0;while(line.charAt(i) != ' '){i++;}String[] values = {line.substring(0,i),line.substring(i+1)};if(values[0].compareTo("child") != 0){child_name = values[0];parent_name = values[1];relation_type = "1";//左右表区分标志context.write(new Text(values[1]), new Text(relation_type+"+"+child_name+"+"+parent_name));//左表relation_type = "2";context.write(new Text(values[0]), new Text(relation_type+"+"+child_name+"+"+parent_name));//右表}/********** End **********/}}public static class Reduce extends Reducer<Text, Text, Text, Text>{ public void reduce(Text key, Iterable<Text> values,Context context) throws IOException,InterruptedException{/********** Begin **********/if(time == 0){ //输出表头context.write(new Text("grand_child"), new Text("grand_parent"));time++;}int grand_child_num = 0;String grand_child[] = new String[10];int grand_parent_num = 0;String grand_parent[]= new String[10];Iterator ite = values.iterator();while(ite.hasNext()){String record = ite.next().toString();int len = record.length();int i = 2;if(len == 0) continue;char relation_type = record.charAt(0);String child_name = new String();String parent_name = new String();//获取value-list中value的childwhile(record.charAt(i) != '+'){child_name = child_name + record.charAt(i);i++;}i=i+1;//获取value-list中value的parentwhile(i<len){parent_name = parent_name+record.charAt(i);i++;}//左表，取出child放入grand_childif(relation_type == '1'){grand_child[grand_child_num] = child_name;grand_child_num++;}else{//右表，取出parent放入grand_parentgrand_parent[grand_parent_num] = parent_name;grand_parent_num++;}}if(grand_parent_num != 0 && grand_child_num != 0 ){for(int m = 0;m<grand_child_num;m++){for(int n=0;n<grand_parent_num;n++){context.write(new Text(grand_child[m]), new Text(grand_parent[n]));//输出结果}}}/********** End **********/}}public static void main(String[] args) throws Exception{// TODO Auto-generated method stubConfiguration conf = new Configuration();Job job = Job.getInstance(conf,"Single table join");job.setJarByClass(simple_data_mining.class);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);String inputPath = "/user/reduce/input"; //设置输入路径String outputPath = "/user/reduce/output"; //设置输出路径FileInputFormat.addInputPath(job, new Path(inputPath));FileOutputFormat.setOutputPath(job, new Path(outputPath));System.exit(job.waitForCompletion(true) ? 0 : 1);}}测评。

mapreduce编程实验报告心得

mapreduce编程实验报告心得【实验报告心得】总结：本次mapreduce编程实验通过实际操作，使我对mapreduce编程框架有了更深入的理解。

在实验过程中，我学会了如何编写map和reduce函数，并利用这些函数从大数据集中进行数据提取和聚合分析。

通过这个实验，我还掌握了如何调试和优化mapreduce任务，以提高数据处理效率和性能。

一、实验目的：本次实验的目的是掌握mapreduce编程框架的使用方法，理解其实现原理，并在实际编程中熟练运用map和reduce函数进行数据处理和分析。

二、实验环境和工具：本次实验使用Hadoop分布式计算框架进行mapreduce编程。

使用的工具包括Hadoop集群、HDFS分布式文件系统以及Java编程语言。

三、实验过程：1. 实验准备：在开始实验前，我首先了解了mapreduce的基本概念和特点，以及Hadoop集群的配置和使用方法。

2. 实验设计：根据实验要求，我选择了一个适当的数据集，并根据具体需求设计了相应的map和reduce函数。

在设计过程中，我充分考虑了数据的结构和处理逻辑，以保证mapreduce任务的高效完成。

3. 实验编码：在实验编码过程中，我使用Java编程语言来实现map 和reduce函数。

我按照mapreduce编程模型，利用输入键值对和中间结果键值对来进行数据处理。

在编码过程中，我注意了代码的规范性和可读性，并进行了适当的优化。

4. 实验测试：完成编码后，我在Hadoop集群上部署和运行了我的mapreduce任务。

通过对数据集进行分析和处理，我验证了自己编写的map和reduce函数的正确性和性能。

5. 实验总结：在实验结束后，我对本次实验进行了总结。

我分析了实验中遇到的问题和挑战，并提出了相应的解决方法。

我还对mapreduce编程框架的优缺点进行了评估，并给出了自己的观点和建议。

四、实验结果和观点：通过本次实验，我成功实现了对选定数据集的mapreduce处理。

mapreduce和hbase实训自我总结

MapReduce和HBase实训自我总结1.引言在进行M ap Re du ce和H Ba se实训后，我深入了解了这两个关键技术对大数据处理和存储的重要性。

本文将总结我在实训中的学习和体验，包括M ap Re du ce的基本原理和应用场景，H B as e的特点和使用方法，以及我在实训中遇到的挑战和解决方案。

2. Ma pReduce的原理和应用2.1M a p R e d u c e的概念M a pR ed uc e是一种分布式计算框架，由G oo gl e公司提出，用于解决大规模数据处理和分析的问题。

其基本原理是将任务分解成多个M ap和R e du ce阶段，通过并行计算和数据分片来提高处理效率。

2.2M a p R e d u c e的应用场景M a pR ed uc e广泛应用于大数据处理和分析，特别适合以下场景：-数据清洗和转换：通过Ma pR ed uc e可以对原始数据进行过滤、清洗和转换，提取出有用的信息；-数据聚合和统计：M a pR ed uc e可以实现大规模数据的聚合和统计，例如计算平均值、查找最大值等；-倒排索引：Ma p R edu c e可以快速构建倒排索引，用于搜索引擎等应用；-图计算：M ap Re du ce可以高效地进行图计算，例如P ag eR an k算法等。

3. HB ase的特点和使用方法3.1H B a s e的概念和特点H B as e是一种分布式、可扩展、面向列的N oS QL数据库，基于H a do op的H DF S存储。

其特点包括：-高可靠性：HB as e通过数据的冗余存储和自动故障转移来保证数据的可靠性；-高性能：H Ba se支持快速读写和随机访问，适用于实时查询和写入场景；-水平扩展：HB as e可以通过增加节点来实现数据的水平扩展，适应不断增长的数据量；-灵活的数据模型：H B as e提供灵活的表结构和丰富的数据类型支持，适用于各种数据存储需求。

实验4-MapReduce编程

《大数据技术原理与应用》
实验指导书
MapReduce编程初级实践
目录
目录
1实验目的 (3)
2实验平台 (3)
3实验内容和要求 (3)
4实验报告 (3)
1实验目的
1. 理解Hadoop中MapReduce模块的处理逻辑
2. 熟悉MapReduce编程
2实验平台
操作系统：Linux
工具：Eclipse或者Intellij Idea等Java IDE
3实验内容和要求
1.在电脑上新建文件夹input，并input文件夹中创建三个文本文件：
file1.txt,file2.txt,file3.txt
三个文本文件的内容分别是：
file1.txt: hello dblab world
file2.txt: hello dblab hadoop
file3.txt: hello mapreduce
2.启动hadoop伪分布式，将input文件夹上传到HDFS上
3.编写mapreduce程序，实现单词出现次数统计。

统计结果保存到hdfs的output文件
夹。

4.获取统计结果（给出截图或相关结果数据）
4实验报告。

mapreduce实验报告总结

mapreduce实验报告总结一、引言MapReduce是一种用于处理和生成大数据集的编程模型和模型化工具，它由Google提出并广泛应用于各种大数据处理场景。

通过MapReduce，我们可以将大规模数据集分解为多个小任务，并分配给多个计算节点并行处理，从而大大提高了数据处理效率。

在本实验中，我们通过实践操作，深入了解了MapReduce的工作原理，并尝试解决了一些实际的大数据处理问题。

二、实验原理MapReduce是一种编程模型，它通过两个核心阶段——Map阶段和Reduce阶段，实现了对大规模数据的处理。

Map阶段负责处理输入数据集中的每个元素，生成一组中间结果；Reduce阶段则对Map阶段的输出进行汇总和聚合，生成最终结果。

通过并行处理和分布式计算，MapReduce可以在大量计算节点上高效地处理大规模数据集。

在本实验中，我们使用了Hadoop平台来实现MapReduce模型。

Hadoop是一个开源的分布式计算框架，它提供了包括MapReduce在内的一系列数据处理功能。

通过Hadoop，我们可以方便地搭建分布式计算环境，实现大规模数据处理。

三、实验操作过程1.数据准备：首先，我们需要准备一个大规模的数据集，可以是结构化数据或非结构化数据。

在本实验中，我们使用了一个包含大量文本数据的CSV文件。

2.编写Map任务：根据数据处理的需求，我们编写了一个Map任务，该任务从输入数据集中读取文本数据，提取出关键词并进行分类。

3.编写Reduce任务：根据Map任务的输出，我们编写了一个Reduce任务，该任务将相同关键词的文本数据进行汇总，生成最终结果。

4.运行MapReduce作业：将Map和Reduce任务编译成可执行脚本，并通过Hadoop作业调度器提交作业，实现并行处理。

5.数据分析：获取处理后的结果，并进行数据分析，以验证数据处理的有效性。

四、实验结果与分析实验结束后，我们得到了处理后的数据结果。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

MapReduce第1关：成绩统计过关代码:import java.io.IOException;import java.util.StringTokenizer;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {Beginpublic static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1);private Text word = new Text();private int maxValue = 0;public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString(),"\n");while (itr.hasMoreTokens()) {String[] str = itr.nextToken().split("");String name = str[0];one.set(Integer.parseInt(str[1]));word.set(name);context.write(word,one);)//context.write(word,one);))public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int maxAge = 0;int age = 0;for (IntWritable intWritable : values) {maxAge = Math.max(maxAge, intWritable.get());)result.set(maxAge);context.write(key, result);))public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);String inputfile = "/user/test/input";String outputFile = "/user/test/output/";FileInputFormat.addInputPath(job, new Path(inputfile));FileOutputFormat.setOutputPath(job, new Path(outputFile));job.waitForCompletion(true);End))命令行touch file01 echo Hello World Bye Worldcat file01echo Hello World Bye World >file01cat file01touch file02echo Hello Hadoop Goodbye Hadoop >file02 cat file02start-dfs.shhadoop fs -mkdir /usrhadoop fs -mkdir /usr/inputhadoop fs -ls /usr/outputhadoop fs -ls /hadoop fs -ls /usrhadoop fs -put file01 /usr/inputhadoop fs -put file02 /usr/inputhadoop fs -ls /usr/input测评MapReduce第2关：文件内容合并去重代码import java.io.IOException;import java.util.*;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class Merge {/*** @param args对A,B两个文件进行合并,并剔除其中重复的内容,得到一个新的输出文件C /〃在这重载map函数,直接将输入中的value复制到输出数据的key上注意在map方法中要抛出异常：throws IOException,InterruptedException/********** Begin **********/public static class Map extends Mapper<LongWritable, Text, Text, Text >{protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)throws IOException, InterruptedException {String str = value.toString();String[] data = str.split("");Text t1= new Text(data[0]);Text t2 = new Text(data[1]);context.write(t1,t2);))/********** End **********/〃在这重载reduce函数,直接将输入中的key复制到输出数据的key上注意在reduce 方法上要抛出异常：throws IOException,InterruptedException/********** Begin **********/public static class Reduce extends Reducer<Text, Text, Text, Text>{protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)throws IOException, InterruptedException {List<String> list = new ArrayList<>();for (Text text : values) {String str = text.toString();if(!list.contains(str)){ list.add(str);))Collections.sort(list);for (String text : list) { context.write(key, new Text(text));))/********** End **********/ )public static void main(String[] args) throws Exception{Configuration conf = new Configuration();Job job = new Job(conf, "word count");job.setJarByClass(Merge.class);job.setMapperClass(Map.class);job.setCombinerClass(Reduce.class);job.setReducerClass(Reduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);String inputPath = "/user/tmp/input/"; 〃在这里设置输入路径String outputPath ="/user/tmp/output/"; //在这里设置输出路径FileInputFormat.addInputPath(job, newPath(inputPath));FileOutputFormat.setOutputPath(job, new Path(outputPath));System.exit(job.waitForCompletion(true) ? 0 : 1);) ) 测评MapReduce第3关：信息挖掘-挖掘父子关系代码import java.io.IOException;import java.util.*;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class simple_data_mining {public static int time = 0;/***@param args*输入一个child-parent的表格*输出一个表达grandchild-grandparent关系的表格*///Map将输入文件根据空格分割成child和parent,然后正序输出一次作为右表,反序输出一次作为左表,需要注意的是在输出的value中必须加上左右表区别标志public static class Map extends Mapper<Object, Text, Text, Text>{public void map(Object key, Text value, Context context) throws IOException,InterruptedException{BeginString child_name = new String();String parent_name = new String();String relation_type = new String();String line = value.toString();int i = 0;while(line.charAt(i) != ' '){i++;)String[] values = {line.substring(0,i),line.substring(i+1)};if(values[0] pareTo("child") != 0){child_name = values[0];parent_name = values[1];relation_type = "1";//左右表区分标志context.write(newText(values[1]),newText(relation_type+"+"+child_name+"+"+parent_name));//左表relation_type = "2";context.write(newText(values[0]),newText(relation_type+"+"+child_name+"+"+parent_name));//右表}/********** End **********/public static class Reduce extends Reducer<Text, Text, Text, Text>{public void reduce(Text key, Iterable<Text> values,Context context)throws IOException,InterruptedException{Beginif(time == 0){〃输出表头context.write(new Text("grand_child"), new Text("grand_parent"));time++;)int grand_child_num = 0;String grand_child[] = new String[10];int grand_parent_num = 0;String grand_parent[]= new String[10];Iterator ite = values.iterator();while(ite.hasNext()){String record = ite.next().toString();int len = record.length();int i = 2;if(len == 0) continue;char relation_type = record.charAt(0);String child_name = new String();String parent_name = new String();//获取value-list 中value 的childwhile(record.charAt(i) != '+'){child_name = child_name + record.charAt(i); i++;) i=i+1;//获取value-list 中value 的parentwhile(i<len){parent_name = parent_name+record.charAt(i);i++;)//左表,取出child 放入grand_childif(relation_type == '1'){grand_child[grand_child_num] = child_name;grand_child_num++;)else{//右表,取出parent 放入grand_parentgrand_parent[grand_parent_num] = parent_name;grand_parent_num++;))if(grand_parent_num != 0 && grand_child_num != 0 ){for(int m = 0;m<grand_child_num;m++){for(int n=0;n<grand_parent_num;n++){context.write(newText(grand_child[m]), new Text(grand_parent[n]));〃输出结果)))/********** End **********/))public static void main(String[] args) throws Exception{// TODO Auto-generated method stubConfiguration conf = new Configuration();Job job = Job.getInstance(conf,"Single table join");job.setJarByClass(simple_data_mining.class);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);String inputPath = "/user/reduce/input";//设置输入路径String outputPath = "/user/reduce/output";//设置输出路径FileInputFormat.addInputPath(job, new Path(inputPath));FileOutputFormat.setOutputPath(job, new Path(outputPath));System.exit(job.waitForCompletion(true) ? 0 : 1);) ) 测评。