Sunday, December 23, 2012

Hadoop : Was the job really successful


An accurate determination of success is critical. 

The check for success primarily involves ensuring that the number of records output is roughly the same as the number of records input. Hadoop jobs are generally dealing with bulk real world data, which is never 100% clean, so a small error rate is generally acceptable.

It is a good practice to wrap your map and reduce methods in a try block that catches Throwables and reports on the catches.

Each call on the reporter object or the output collector provides a heartbeat to the framework,
                reporter.incrCounter( "Input", "total records", 1 );
                reporter.incrCounter( "Input", "parsed records", 1 );
                reporter.incrCounter( "Input", "number format", 1 );      
                reporter.incrCounter( "Input", "Exception", 1 );
                // better to use ENUMS to avoid spelling mistakes or extra spaces in end



if (format != 0) {
logger.warn( "There were " + format + " keys that were not "+ "transformable to long values");
}
/** Check to see if we had any unexpected exceptions. This usually indicates some significant problem, either with the machine running the task that had the exception, or the map or reduce function code. Log an error for each type of exception with the count.
*/
if (exceptions > 0 ) {
                Counters.Group exceptionGroup = jobCounters.getGroup(
                TransformKeysToLongMapper.EXCEPTIONS );
                for (Counters.Counter counter : exceptionGroup) {
                                logger.error( "There were " + counter.getCounter()
                                + " exceptions of type " + counter.getDisplayName() );
                }
}
if (total == parsed) {
                logger.info("The job completed successfully.");
                System.exit(0);
}
// We had some failures in handling the input records. Did enough records process for this to be a
// successful job is 90% good enough?
if (total * .9 <= parsed) {
logger.warn( "The job completed with some errors, "+ (total - parsed) + " out of " + total );
System.exit( 0 );
}
logger.error( "The job did not complete successfully,"+" too many errors processing the input, only "
+ parsed + " of " + total + "records completed" );
System.exit( 1 );

No comments: