Friday, November 17, 2023

Fair Scheduler in YARN

In the post YARN in Hadoop we have already seen that it is the scheduler component of the ResourceManager which is responsible for allocating resources to the running jobs. The scheduler component is pluggable in Hadoop and there are two options for scheduler- capacity scheduler and fair scheduler. This post talks about the fair scheduler in YARN, its benefits and how fair scheduler can be configured in Hadoop cluster.

YARN Fair Scheduler

Fair scheduler in YARN allocates resources to applications in such a way that all apps get, on average, an equal share of resources over time. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, in the form (X mb, Y vcores).

When there is a single app running, that app uses the entire Hadoop cluster. When other apps are submitted, they don't have to wait for the running app to finish, resources that free up are assigned to the new apps, so that each app eventually gets roughly the same amount of resources.


Queues in Fair Scheduler

Fair scheduler organizes apps further into “queues”, and shares resources fairly between these queues.

A default queue can be configured to be used by users. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. There are also options to configure queue placement policy based on user names, primary group of the user, secondary group of the user.

Within each queue, a separate scheduling policy can be used to share resources between the running apps. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured.

As example- If there are two queues A and B, where queue A uses fair scheduling policy and queue B uses FIFO policy then the jobs submitted to queue A will use resources fairly where as jobs submitted to queue B will use the resources on First come basis.

Hierarchical Queues in Fair Scheduler

The fair scheduler in YARN supports hierarchical queues which means sub-queues can be created with in a dedicated queue. All queues descend from a queue named “root”.

A queue’s name starts with the names of its parents, with periods as separators. So a queue named “parent1” under the root queue, would be referred to as “root.parent1”, and a queue named “queue2” under a queue named “parent1” would be referred to as “root.parent1.queue2”.

YARN fair scheduler configuration

To use the Fair Scheduler first assign the appropriate scheduler class in yarn-site.xml:

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property> 

For setting up queues, changes are done in the configuration file etc/hadoop/fair, some of the configured elements are as follows-

1- queue element- Represent queues, queue elements can take an optional attribute ‘type’, which when set to ‘parent’ makes it a parent queue. Each queue element may contain the following properties:
  • minResources: minimum resources the queue is entitled to, in the form “X mb, Y vcores”.
  • maxResources: maximum resources a queue is allocated, expressed either in absolute values (X mb, Y vcores) or as a percentage of the cluster resources (X% memory, Y% cpu).
  • maxChildResources: maximum resources any child queue is allocated, expressed either in absolute values (X mb, Y vcores) or as a percentage of the cluster resources (X% memory, Y% cpu).
  • weight: to share the cluster non-proportionally with other queues. Weights default to 1, and a queue with weight 2 should receive approximately twice as many resources as a queue with the default weight.
  • schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf” or any class that extends org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy. Defaults to “fair”.

2- queueMaxResourcesDefault element: which sets the default max resource limit for queue; overridden by maxResources element in each queue.

3- defaultQueueSchedulingPolicy element: which sets the default scheduling policy for queues; overridden by the schedulingPolicy element in each queue if specified. Defaults to “fair”.

4- queuePlacementPolicy element: which contains a list of rule elements that tell the scheduler how to place incoming apps into queues. Rules are applied in the order that they are listed. All rules accept the “create” argument, which indicates whether the rule can create a new queue. “Create” defaults to true; if set to false and the rule would place the app in a queue that is not configured in the allocations file, we continue on to the next rule. Valid rules are:
  • specified: the app is placed into the queue it requested.
  • user: the app is placed into a queue with the name of the user who submitted it.
  • primaryGroup: the app is placed into a queue with the name of the primary group of the user who submitted it.
  • secondaryGroupExistingQueue: the app is placed into a queue with a name that matches a secondary group of the user who submitted it.
  • nestedUserQueue: the app is placed into a queue with the name of the user under the queue suggested by the nested rule.
  • default: the app is placed into the queue specified in the ‘queue’ attribute of the default rule.
  • reject: the app is rejected.

Fair scheduler queue configuration example

If there are two child queues starting from root XYZ and ABC. XYZ queue is further divided into two child queues technology and development.

<?xml version="1.0"?>
<allocations>
  <queue name="ABC">
    <minResources>10000 mb,10vcores</minResources>
    <maxResources>60000 mb,30vcores</maxResources>
    <weight>2.0</weight>
    <schedulingPolicy>fair</schedulingPolicy>
  </queue> 
  
<queue name="XYZ">
      <minResources>20000 mb,0vcores</minResources>
      <maxResources>80000 mb,0vcores</maxResources>
      <weight>3.0</weight>
      <schedulingPolicy>fifo</schedulingPolicy>
      <queue name="technology" />
      <queue name="development" />
</queue>
<queueMaxResourcesDefault>40000 mb,20vcores</queueMaxResourcesDefault>

<queuePlacementPolicy>
    <rule name="specified" />
    <rule name="primaryGroup" create="false" />
    <rule name="default" queue="ABC"/>
  </queuePlacementPolicy>
</allocations>

Reference: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

That's all for this topic Fair Scheduler in YARN. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Hadoop Framework Tutorial Page


Related Topics

  1. Introduction to Hadoop Framework
  2. Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode
  3. Uber Mode in Hadoop
  4. Replica Placement Policy in Hadoop Framework
  5. MapReduce Flow in YARN

You may also like-

  1. File Write in HDFS - Hadoop Framework Internal Steps
  2. HDFS High Availability
  3. HDFS Commands Reference List
  4. Word Count MapReduce Program in Hadoop
  5. Compressing File in bzip2 Format in Hadoop - Java Program
  6. Synchronization in Java Multi-Threading
  7. Java Collections Interview Questions
  8. How to Create Password Protected Zip File in Java

1 comment:

  1. Hey if you highlight lines with little more light shade would really ease to understand and look very attractive - now it is piercing to eyes. Many people face but only a few or one bothers to suggest.

    ReplyDelete