Providing Elasticity To Hadoop Cluster Using The Concept Of LVM

Tribhuban Mishra
6 min readMar 20, 2021

In this Article, we are going to see how to provide the elasticity to our datanode Storage. So that in future on-demand we can increase or decrease the size of the datanodes without altering the cluster.

Before We get started, Let’s understand some Basic concept……

What is LVM ?

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes.The Concept of using LVM is very much similar to virtualization.Generally, we have a physical device that is divided into multiple partitions. All these partitions have a file system installed on them which can be used to manage these partitions. We can create as many virtual storage volumes on top of a single storage device as you want. The logical storage volumes thus created can be expanded or shrunk according to your growing or reducing storage needs.

Architecture Of LVM

To create an LVM logical volume, the physical volumes are combined into a volume group . This creates a pool of disk space out of which LVM logical volume can be allocated. This process is analogous to the way in which disks are divided into partitions. A logical volume is used by file systems and applications.

Below are some advantages of using Logical volumes over using physical storage directly:

  • Resize storage pools: You can extend the logical space as well as reduce it without reformatting the disks.
  • Flexible storage capacity: You can add more space by adding more disks and adding them to the pool of physical storage, thus you have a flexible storage capacity.

Why We Need To Provide The Elasticity To DataNode?

Consider a Hadoop cluster is running with 1000 of datanodes and storage of all the datanodes get exhausted. So,In this case we have two options either we increase the number data nodes to the hadoop Cluster or increase the storage of these existing data nodes. Adding more data nodes needs more RAM & CPU so it is good to increase the size of existing data nodes. But again, we encounter with one more problem which is we cannot increase the size of the static partition, now here we need elasticity through which we can increase the size of the partitions. Here we use the concept of LVM to provide elasticity.

Pre-requisite :

  1. A hadoop Cluster having atleast One NameNode(Master Node) and one Data Node Configured.
  2. Concept Of Linux Partition.

Platform Used :

In This task , I have used RedHat on top of AWS.

Implementation steps to provide Elasticity:

  1. Attach the EBS volume to instance where datanode is running.
  2. Create physical volume, volume group and logical volume .
  3. Format the logical volume and mount it to the directory.
  4. Share the directory mounted with logical volume to data node.
  5. Increase or decrease the storage size of logical volume.

Before implementing the concept of Elasticity, let’s see the how much storage has been shared by datanode to namenode.In my case ,the directory(/datanode1) that i have shared to datanode is created inside root directory or absolute directory and bydefault they contribute all volume of absolute directory to namenode.

volume of absolute directory is 10G

Let’s Start with Practical…..

1. Attach the EBS volume to instance where datanode is running.

Step-1 : Create the EBS Volume and attach it to the instance.

Here, i have created 2 EBS volume DataNode-1 & DataNode-2 having volume 10GiB.

Step-2 : check the storage has been attached to the instance where datanode is running or not.

fdisk -l

As you can see that the volume /dev/xvdf and /dev/xvdg of 10 GiB each has been attached to the instance.

2. Create physical volume, volume group and logical volume .

Step-1 : Install the Software that provides the command to create physical volume.

yum install lvm2 -y

Step-2 : Create the physical Volume for both volume we attached and check the details of the physical volume created using pvdisplay command.

For First Volume :

pvcreate /dev/xvdf

pvdisplay /dev/xvdf

For Second Volume :

pvcreate /dev/xvdg

pvdisplay /dev/xvdg

Step-3 : Create the Volume group from where Logical Volume get Storage and display the detail of volume group using following command:-

vgcreate hadoopvg /dev/xvdf /dev/xvdg

vgdisplay hadoopvg

Step-4 : let’s Create a Logical volume of Size 15GiB which will take storage from above created 20GiB Volume group.

lvcreate — size 15GB — name hadooplv hadoopvg

lvdisplay hadoopvg/hadooplv

3. Format the logical volume and mount it to the directory.

Step-1 : Format the Logical volume partition.

mkfs.ext4 /dev/hadoopvg/hadooplv

Step-4 : create the directory and mount the logical volume partition.

mkdir /datanode2

mount /dev/hadoopvg/hadooplv /datanode2

4. Share the directory mounted with logical volume to data node.

Step-1 : Configure hdfs-site.xml file in datanode.

Step-2 : Check the volume that is shared with the name node with the help of following command:

hadoop dfsadmin -report

Now we can see that the configured capacity is almost 15 GB that is the size of logical volume we created.

5. Increase the storage Size of logical volume.

Step-1 : Increase the size of logical volume using following command

lvextend — size +3GB /dev/hadoopvg/hadooplv

Step-2 : Format the remaining extra 2 BG partition using resize2fs command.If we format the partition using mkfs.ext4 command, it will format the whole logical volume and erase the data.

Step-3 : Now, again check the storage that has been shared to Namenode by datanode.

As here we can see the configure capacity is approax 17GB.In this way we have increased the size of partition and storage capacity of datanode.

6. Decrease the storage size of logical volume.

Step-1 : Unmount the partition

umount /datanode2

Step-2 : Clean/scan the partition

e2fsck -f /dev/hadoopvg/hadooplv

Step-3 : Format the partition using resize2fs command.

resize2fs /dev/hadoopvg/hadooplv 16G

Step-4 : reduce the size of partition and Mount the partition.

lvreduce — size -10GB /dev/hadoopvg/hadooplv

In this way, we have integrated LVM with Hadoop to provide Elasticity to hadoop Cluster.

--

--