Monday, November 13, 2023

HDFS Federation in Hadoop Framework

In this post we’ll talk about the HDFS Federation feature introduced in Hadoop 2.x versions. With HDFS federation we can have more than one NameNode in the Hadoop cluster each managing a part of the namespace.

HDFS architecture limitations

HDFS follows a master/slave architecture where NameNode acts as a master and then there are several DataNodes. NameNode in the HDFS architecture performs the following tasks.

  1. Managing the Namespace– NameNode manages the file system namespace. NameNode stores metadata about the files and directories in the file system.

    NameNode also supports all the namespace related file system operations such as create, delete, modify and list files and directories.

  2. Block Storage Service- NameNode also maintains block mapping i.e. in which DataNode any given block is stored.

    NameNode Supports block related operations such as create, delete, modify and get block location.

In the prior HDFS architecture (Before Hadoop 2.x) only single namespace for the entire cluster is allowed and a single Namenode manages the namespace. In a large clusters with many files having a single NameNode may become a limiting factor for memory needs and scaling.

HDFS Federation tries to address this problem.

HDFS Federation support for multiple NameNodes/NameSpaces

HDFS Federation addresses this limitation of having a single NameNode by adding support for multiple Namenodes/namespaces to HDFS. In HDFS Federation architecture NameNodes manages a part of the namespace. Thus HDFS Federation adds support for namespace horizontal scaling.

As Example– If there are two namespaces /sales and /finance then you can have two Namenodes; NameNode1 and NameNode2 where NameNode1 manages all files under /sales and NameNode2 manages all files under /finance.

In HDFS federation Namenodes are federated; the Namenodes are independent and do not require coordination with each other. Thus, a Namenode failure does not prevent the Datanode from serving other Namenodes in the cluster.

Note that Datanodes are still used as common storage for blocks by all the Namenodes. Each Datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports.

HDFS Federation Namespace Volume

In HDFS Federation a set of blocks that belong to a single namespace is known as Block Pool.

A Namespace and its block pool together are called Namespace Volume. It is a self-contained unit of management.

HDFS Federation configuration

If we take the same example of having two namespaces /sales and /finance and two namenodes NameNode1 and NameNode2 then the required configuration changes are as follows.

You need to add the dfs.nameservices parameter to your configuration file (hdfs-site.xml) and configure it with a list of comma separated NameServiceIDs.

<property>
  <name>dfs.nameservices</name>
  <value>sales,finance</value>
</property>

You also need to add following configuration parameters suffixed with the corresponding NameServiceID.

<property>
  <name>dfs.namenode.rpc-address.sales</name>
  <value>namenode1:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.sales</name>
  <value>namenode1:50070</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.finance</name>
  <value>namenode2:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.finance</name>
  <value>namenode2:50070</value>
</property>
You can use ViewFs to create personalized namespace views. That change is required in core-site.xml.
<property>
  <name>fs.defaultFS</name>
  <value>viewfs:///</value>
</property>
Then the mount table config variables for sales and finance.
<property>
  <name>fs.viewfs.mounttable.default.link./sales</name>
  <value>hdfs://namenode1:8020/sales</value>
</property>

<property>
  <name>fs.viewfs.mounttable.default.link./finance</name>
  <value>hdfs://namenode2:8020/finance</value>
</property>

Reference: https://hadoop.apache.org/docs/r2.9.0/api/org/apache/hadoop/fs/viewfs/ViewFs.html

That's all for this topic HDFS Federation in Hadoop Framework. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Hadoop Framework Tutorial Page


Related Topics

  1. HDFS High Availability
  2. Replica Placement Policy in Hadoop Framework
  3. What is SafeMode in Hadoop
  4. File Read in HDFS - Hadoop Framework Internal Steps
  5. Java Program to Write File in HDFS

You may also like-

  1. Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode
  2. Introduction to Hadoop Framework
  3. HDFS Commands Reference List
  4. How to Compress Intermediate Map Output in Hadoop
  5. How to Configure And Use LZO Compression in Hadoop
  6. Installing Ubuntu Along With Windows
  7. Java Exception Handling Interview Questions
  8. Finding Duplicate Elements in an Array - Java Program