Run Hadoop MapReduce jobs over
Avro data, with map and reduce functions written in Java.
Avro data files do not contain key/value pairs as expected by
  Hadoop's MapReduce API, but rather just a sequence of values.  Thus
  we provide here a layer on top of Hadoop's MapReduce API.
In all cases, input and output paths are set and jobs are submitted
  as with standard Hadoop jobs:
 
   - Specify input files with {@link
   org.apache.hadoop.mapred.FileInputFormat#setInputPaths}
- Specify an output directory with {@link
   org.apache.hadoop.mapred.FileOutputFormat#setOutputPath}
- Run your job with {@link org.apache.hadoop.mapred.JobClient#runJob}
For jobs whose input and output are Avro data files:
 
   - Call {@link org.apache.avro.mapred.AvroJob#setInputSchema} and
   {@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
   job's input and output schemas.
- Subclass {@link org.apache.avro.mapred.AvroMapper} and specify
   this as your job's mapper with {@link
   org.apache.avro.mapred.AvroJob#setMapperClass}
- Subclass {@link org.apache.avro.mapred.AvroReducer} and specify
   this as your job's reducer and perhaps combiner, with {@link
   org.apache.avro.mapred.AvroJob#setReducerClass} and {@link
   org.apache.avro.mapred.AvroJob#setCombinerClass}
For jobs whose input is an Avro data file and which use an {@link
  org.apache.avro.mapred.AvroMapper}, but whose reducer is a non-Avro
  {@link org.apache.hadoop.mapred.Reducer} and whose output is a
  non-Avro format:
 
   - Call {@link org.apache.avro.mapred.AvroJob#setInputSchema} with your
   job's input schema.
- Subclass {@link org.apache.avro.mapred.AvroMapper} and specify
   this as your job's mapper with {@link
   org.apache.avro.mapred.AvroJob#setMapperClass}
- Implement {@link org.apache.hadoop.mapred.Reducer} and specify
   your job's reducer with {@link
   org.apache.hadoop.mapred.JobConf#setReducerClass}.  The input key
   and value types should be {@link org.apache.avro.mapred.AvroKey} and {@link
   org.apache.avro.mapred.AvroValue}.
- Optionally implement {@link org.apache.hadoop.mapred.Reducer} and
   specify your job's combiner with {@link
   org.apache.hadoop.mapred.JobConf#setCombinerClass}.  You will be unable to
   re-use the same Reducer class as the Combiner, as the Combiner will need
   input and output key to be {@link org.apache.avro.mapred.AvroKey}, and
   input and output value to be {@link org.apache.avro.mapred.AvroValue}.
- Specify your job's output key and value types {@link
   org.apache.hadoop.mapred.JobConf#setOutputKeyClass} and {@link
   org.apache.hadoop.mapred.JobConf#setOutputValueClass}.
- Specify your job's output format {@link
   org.apache.hadoop.mapred.JobConf#setOutputFormat}.
For jobs whose input is non-Avro data file and which use a
  non-Avro {@link org.apache.hadoop.mapred.Mapper}, but whose reducer
  is an {@link org.apache.avro.mapred.AvroReducer} and whose output is
  an Avro data file:
 
   - Set your input file format with {@link
   org.apache.hadoop.mapred.JobConf#setInputFormat}.
- Implement {@link org.apache.hadoop.mapred.Mapper} and specify
   your job's mapper with {@link
   org.apache.hadoop.mapred.JobConf#setMapperClass}.  The output key
   and value type should be {@link org.apache.avro.mapred.AvroKey} and
   {@link org.apache.avro.mapred.AvroValue}.
- Subclass {@link org.apache.avro.mapred.AvroReducer} and specify
   this as your job's reducer and perhaps combiner, with {@link
   org.apache.avro.mapred.AvroJob#setReducerClass} and {@link
   org.apache.avro.mapred.AvroJob#setCombinerClass}
- Call {@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
   job's output schema.
For jobs whose input is non-Avro data file and which use a
  non-Avro {@link org.apache.hadoop.mapred.Mapper} and no reducer,
  i.e., a map-only job:
 
   - Set your input file format with {@link
   org.apache.hadoop.mapred.JobConf#setInputFormat}.
- Implement {@link org.apache.hadoop.mapred.Mapper} and specify
   your job's mapper with {@link
   org.apache.hadoop.mapred.JobConf#setMapperClass}.  The output key
   and value type should be {@link org.apache.avro.mapred.AvroWrapper} and
   {@link org.apache.hadoop.io.NullWritable}.
- Call {@link
   org.apache.hadoop.mapred.JobConf#setNumReduceTasks(int)} with zero.
   
- Call {@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
   job's output schema.