Codementor Events

The Hadoop Distributed File System

Published Jan 24, 2019

Originally built as infrastructure for the Apache Nutch web search engine project, the Hadoop Distributed File System (HDFS) is now a separate Apache Hadoop subproject that can be found at http://hadoop.apache.org/hdfs/. HDFS is designed to run on commodity hardware. While there are many similarities with existing distributed file systems, the differences are significant.

HDFS is highly fault-tolerant, designed to be deployed on low-cost hardware, while also providing high throughput access to application data. In this article, you’ll see how HDFS works and its key features. Dive in!

How HDFS works

When you set up a Hadoop cluster, Hadoop creates a virtual layer on top of your local filesystem (such as a Windows- or Linux-based filesystem). As you may have noticed, HDFS does not map to any physical filesystem on operating system, but Hadoop offers abstraction on top of your Local FS to provide a fault-tolerant distributed filesystem service with HDFS. The overall design and access pattern in HDFS is like a Linux-based filesystem. The following diagram shows the high-level architecture of HDFS:

2.png

Each file sent to HDFS is sliced into a number of blocks that need to be distributed. The NameNode maintains the registry (or name table) of all of the nodes present in the data in the local filesystem path specified with dfs.namenode.name.dir in hdfs-site.xml, whereas the Secondary NameNode replicates this information through checkpoints. You can have many Secondary NameNodes. Typically the NameNode would store information pertaining to directory structure, permissions, mapping of files to block, and so forth.

This filesystem is persisted in two formats: FSimage and Editlogs. FSimage is a snapshot of a namenode's filesystem metadata at a given point, whereas Editlogs record all the changes from the last snapshot that is stored in FSimage. FSimage is a data structure made efficient for reading, so HDFS captures the changes to the namespace in Editlogs to ensure durability. Hadoop provides an offline image viewer to dump FSimage data into human-readable format.

Key features of HDFS

In this section, we will go over some of the marquee features of HDFS that offer advantages for Hadoop users, such as multi tenancy, snapshots, safe mode and others.

Achieving multi tenancy in HDFS

HDFS supports multi tenancy through its Linux-like Access Control Lists (ACLs) on its filesystem. When you are working across multiple tenants, it boils down to controlling access for different users through the HDFS command-line interface. So, the HDFS Administrator can add tenant spaces to HDFS through its namespace (or directory), for example, hdfs://<host>:<port>/tenant/<tenant-id>. The default namespace parameter can be specified in hdfs-site.xml.

It is important to note that HDFS uses the local filesystem's users and groups for its own, and it does not govern or validate whether the created group exists or not. Typically, for each tenant, one group can be created, and users who are part of that group can get access to all of the artifacts of that group. Alternatively, the user identity of a client process can happen through a Kerberos principal. Similarly, HDFS supports attaching LDAP servers for the groups. With a local filesystem, it can be achieved with the following steps:

  1. Create a group for each tenant, and add users to this group in local FS
  2. Create a new namespace for each tenant, for example, /tenant/<tenant-id>
  3. Make the tenant the complete owner of that directory through the chown command
  4. Set access permissions on tenant-id of a group for the tenant
  5. Set up a quota for each tenant through dfadmin -setSpaceQuota <Size> <path> to control the size of files created by each tenant

> Note
HDFS does not provide any control over the creation of users and groups or the processing of user tokens. Its user identity management is handled externally by third-party systems.

Snapshots of HDFS

Creating snapshots in HDFS is a feature by which one can take a snapshot of the filesystem and preserve it. These snapshots can be used as data backup and provide DR in case of any data losses. Before you take a snapshot, you need to make the directory snapshottable. Use the following command:

hrishikesh@base0:/$  ./bin/hdfs dfsadmin -allowSnapshot <path>

Once this is run, you will get a message stating that it has succeeded. Now you are good to create a snapshot, so run the following command:

hrishikesh@base0:/$  ./bin/hdfs dfs -createSnapshot <path> <snapshot-name>

Once this is done, you will get a directory path to where this snapshot is taken. You can access the contents of your snapshot. The following screenshot shows how the overall snapshot runs:

1.png

You can access a full list of snapshot-related operations, such as renaming a snapshot and deleting a snapshot, here (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html).

Safe mode

When a NameNode starts, it looks for FSImage and loads it in memory. It then looks for the past edit logs and applies them on FSImage, creating a new FSImage. After this process is complete, the NameNode starts service requests over HTTP and other protocols. Usually, DataNodes hold the information pertaining to the location of blocks; when a NameNode loads up, DataNodes provide this information to the NameNode. This is the time when the system runs in safe mode. Safe Mode is exited when the dfs.replication.min value for each block is met.

HDFS provides a command to check if a given filesystem is running in safe mode or not:

hrishikesh@base0:/$  ./bin/hadoop dfsadmin -safemode get

This should provide you the information of whether safe mode is on. In such a case, the filesystem only provides read access to its repository. Similarly, the Administrator can choose to enter in safe mode with the following command:

hrishikesh@base0:/$  ./bin/hadoop dfsadmin -safemode enter

Similarly, the safemode leave option is also provided.

Hot swapping

HDFS allows users to hot swap its DataNode in a live fashion. The associated Hadoop JIRA issue is listed here (https://issues.apache.org/jira/browse/HDFS-664). Please note that hot swapping has to be supported by the underlying hardware system. If this is not supported, you may have to restart the affected DataNode, after replacing its storage device.

However, before Hadoop gets into replication mode, you would need to provide the new corrected DataNode volume storage. The new volume should be formatted and, once it's done, the user should update dfs.datanode.data.dir in the configuration. After this, the user should run the reconfiguration using the dfsadmin command as listed here:

hrishikesh@base0:/$ ./bin/hdfs dfsadmin -reconfig datanode HOST:PORT start

Once this activity is complete, the user can take out the problematic data storage from the datanode.

Federation

HDFS provides federation capabilities for its various users. This also adds up in multi tenancy. Previously, each deployment cluster of HDFS used to work with a single namespace, thereby limiting horizontal scalability. With HDFS Federation, the Hadoop cluster can now scale horizontally.

A block pool represents a single namespace containing a group of blocks. Each NameNode in the cluster is directly correlated to one block pool. Since DataNodes are agnostic to namespaces, the responsibility of managing blocks pertaining to any namespace stays with the NameNode.

Even if the NameNode for any federated tenant goes down, the remaining NameNodes and DataNodes can function without any failures. The document here (https://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-hdfs/Federation.html) covers the configuration for HDFS Federation.

Intra-DataNode balancer

The need for a DataNode balancer arose for various reasons. The first is because, when a disk is replaced, the DataNodes need to be re-balanced based on the available space. Secondly, with default round-robin scheduling available in Hadoop, mass file deletion from certain DataNodes leads to unbalanced DataNode storage.

This was raised as JIRA issue HDFS-1312 (https://issues.apache.org/jira/browse/HDFS-1312), and it was fixed in Hadoop 3.0-alpha1. The new disk balancer supports reporting and balancing functions. The following table describes all the available commands:

table.PNG

Today, the system supports round-robin-based disk balancing and free space, a percentage of which is based on load distribution scheduling algorithms.

Hope you enjoyed reading this article. If you want to learn more about Apache Hadoop, you can check out Apache Hadoop 3 Quick Start Guide. Written by Hrishikesh Vijay Karambelkar, an innovator and enterprise architect with over 16 years of experience, the book follows a step-by-step approach the explain the underlying concepts in a hands-on way. Apache Hadoop 3 Quick Start Guide is for aspiring as well as experienced big data professionals who want to learn the essentials and get up to speed with the latest features of Hadoop 3

Discover and read more posts from PACKT
get started
post commentsBe the first to share your opinion
Show more replies