Apache Hadoop YARN: Yet Another Resource Negotiator
Apache Yarn – “Yet Another Resource Negotiator” is the resource management layer of Hadoop.
The root of all problems was the fact that MapReduce had too many responsibilities. It was practically in charge of everything above HDFS layer, assigning cluster resources and managing job execution (system), doing data processing (engine) and interfacing towards clients (API). Consequently, there was no other choice for higher level frameworks other than to build on top of MapReduce.
ResourceManger is the master daemon that communicates with clients, tracks resources on the cluster and assigning tasks to
ResourceMangerhas two main components: 1. Scheduler 2.Application Manager. The scheduler basically decides where the Application master will run and where these containers will be scheduled. The Application Manager manages running
ApplicationMastersin the cluster, i.e., it is responsible for starting application masters and for monitoring and restarting them on different nodes in case of failures.
NodeManageris a worker daemon that launchers and tracks processes spawned on worker hosts. The
NodeManagerwill track its own local resources and communicates its resource configuration to the
Containeris an important YARN concept. We can think of
containersas request to hold resources(CPUs and Memory) on the YARN cluster.
For each running application, a special piece of code called an
ApplicationMasterhelps coordinate tasks on the YARN cluster. It negotiates resources from the
ResourceManagerand works with the
- 1.The application starts and talks to the
ResourceManagerof the cluster.
ResourceManagermakes a single container request on behalf of the application and the
ApplicationMasterruns within that container.
ApplicationMasterrequests subsequent containers from the
ResourceManagerthat are allocated to run tasks for the application. The tasks do most of the status communication with the
- 4.Once all tasks are finished, the
ApplicationMasterexits. The last container is de-allocated from the cluster.
 When the application is running, the RM is not in the loop at all. The AM directly communicates with the client and the containers it runs, which means if the RM were to crash, your application will just keep running.
 In analogy, it occupies the place of JobTracker of Hadoop v1.
 more of a generic and efficient version of
TaskTrackerIn contrast to fixed number of slots for map and reduce tasks, the NodeManager has a number of dynamically created resource containers. There is no hard code split available into Map and Reduce slots as in Hadoop v1
 If a tasks(container) fails, the AM is responsible for updating its demand to compensate.
ResourceManageris able to allocate a resource to the
ApplicationMaster, it generates a lease that the
ApplicationMasterpulls on a subsequent heartbeat. A security token associated with the lease guarantees its authenticity when the
ApplicationManagerpresents the lease to the
NodeManagerto gain access to the container.
ApplicationMasterheartbeats to the
ResourceManagerto communicate its changing resource needs, and to let the
ResourceManagerknow it is still alive. In response, the
ResourceManagercan return a lease on additional containers on other nodes, or cancel the lease on some container
2. The Scheduler inside the RM has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. Depending on the use case and business needs, administrators may select either a simple FIFO (first in, first out), capacity, or fair share scheduler. (http://www.corejavaguru.com/bigdata/hadoop-tutorial/yarn-scheduler)