Amazon Team announced a new release of Elastic Mapreduce(EMR)5.0.0 . This release includes support for 16 open source Hadoop ecosystem projects, use of Tez by default for Hive and Pig, Hue and Zeppelin has enhancements to user interface and improved debugging functionality.
AWS has been on worked up since July last year to continually update this tool and provide support for an increasing number of Hadoop projects to give its customers the large number of choices.
A quick recap on the year’s launches:
EMR 5.0.0 new features:
- Support for 16 Open Source Hadoop Ecosystem Projects
- Major Version Upgrade for Spark and Hive
- User Interface Improvements
- Enhanced Debugging Functionality
- Launch a Cluster Today
Support for 16 Open Source Hadoop Ecosystem Projects:
Apache open source tool called Bigtop was used by AWS, which helped infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. It has helped them advance the pace of development.
You can create a new EMR cluster by choosing the desired set of apps. This feature includes Apache Hadoop, Apache Spark, Presto, Apache Hive, Apache HBase, and Apache Tez .
Major Version Upgrade for Spark and Hive:
In this release EMR has been updated from 1.0 to 2.1 accompanied by a Move to Java 8. There is an update in Spark from 1.6.2 to 2.0.with a similar update to Scala 2.11. The two major releases are Spark and Hive updates which includes new features, performance improvements and bug fixes. For example, the Structured Streaming API is being included in Spark. No 100% backward compatible with the old ones in Spark and Hive; you need to check your code and upgrade to EMR 5.0.0 with care.
With this release, default execution engine for Hive 2.1 and Pig 0.16 is Tez now, replacing Hadoop MapReduce and resulting in better performance, reduce query latency is also included. With this update, MapReduce is used by EMR only when the Hadoop MapReduce is running the job directly (Spark has its own framework; Hive and pig uses Tez now).
User Interface Improvements:
Apache Zeppelin (a notebook for interactive data) is updated from 0.5.6 to 0.6.1. Hue (an interface for analyzing data with Hadoop) from 3.7.1 to 3.10.The new version web based tools include lots of smaller improvements and includes new features.
Enhanced Debugging Functionality:
Some debugging functionality is being included in this version. you can figure in which particular step EMR job has failed.
Launch a Cluster Today:
In any AWS region you can launch an EMR cluster today.
Below steps to be followed:
- Open up the EMR Console
- Click on Create cluster.
- Choose emr-5.0.0 from the Release menu.