上QQ阅读APP看书，第一时间看更新

YARN

Now we have completed some hands-on experience using the MapReduce framework. You should now be able to recall that in MapReduce, we see a service called Job Tracker. Job Tracker does the following things:

It gets the information from NameNode to figure out where the data that will be required by the client program resides
It coordinates with respective Task Trackers to assign jobs
It monitors the life cycle of a Task Tracker's activities

In a normal scenario, it seems perfectly acceptable that a single machine can handle the preceding activities. We should instead be thinking of these services taking place in a production environment with a large cluster, where thousands of jobs will be running. This means that the Job Tracker will be performing several activities, a burden for the Job Tracker and an inelegant solution. This way of working can also lead to a performance bottleneck, as new jobs will continuously arrive to a single point of contact in Hadoop.

The second problem in the current Hadoop framework is resource allocation. So, let's assume that we have four machines in a cluster. Two of them have 8 GB of RAM and the other two have 16 GB of RAM with an equal amount of storage capacity. When Job Tracker assigns a job to Task Tracker, Task Tracker is only responsible for processing that task within its own machine resources. It is also worth mentioning that in this scenario, all machines will be considered the same and Task Tracker will performs a task whether it has 8GB of RAM or 16GB available.

To overcome these architectural issues, Hadoop has introduced a new processing model called YARN (Yet Another Resource Negotiator) to replace the Job Tracker and Task Tracker model. It is also called Hadoop 2. The following figure illustrates Hadoop's architectural with YARN included:

YARN pides the functionality of a Job Tracker into different services. It works with the following four new services:

Resource Manager
Node Manager
Container
Application Manager

The following figure illustrates YARN's model and where the preceding services will run:

YARN supports programs written in older APIs, that is, MapReduce v-1, as well programs written in newer APIs.