Configure Hadoop Cluster Using Ansible Playbook.

2 min readFeb 17, 2021

What is Hadoop ?

Hadoop is a framework that allows you to first store Big Data in a distributed environment, so that, you can process it parallely.

What is Ansible ?

Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code.

What is the Meaning of Cluster ?

A cluster is a group of inter-connected computers that work together to perform computationally intensive tasks. In a cluster, each computer is referred to as a “node”. A cluster has a small number of “head nodes”, usually one or two, and a large number of “compute nodes”.

Let’s write the playbook for configuring the Hadoop Cluster -

So here for configuring the Hadoop Cluster we need minimum two system , One will be namenode ( In hadoop your master node known as namenode ) and second will be your datanode. ( the system which will connect with your namenode known as datanode ).

Let’s write playbook to configure the namenode :-

This above playbook will configure the namenode for you , before run this playbook we have to go our inventory file and update it.

In my case the inventory file location is

vim /etc/ip.txt
[namenode]
IP_of_sysem(192.168.43.28)  ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh

save the inventory file and run the playbook

ansible-playbook namenode.yml

After run the playbook your namenode will configure.

Let’s write the playbook for datanode :-

This above playbook configure the datanode for you , same thing we have to do here before running this playbook update the inventory file

vim /etc/ip.txt
[namenode]
IP_of_System(192.168.43.28)  ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh[datanode]
IP_of_System(192.168.43.83)  ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh

Save it and run your playbook

ansible-playbook datanode.yml

If you want more than one datanode you can update the detail of that system in your inventory

vim /etc/ip.txt
[namenode]
IP_of_System(192.168.43.28)  ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh[datanode]
IP_of_System(192.168.43.83)  ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=sshIP_of_System(192.168.43.83)  ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh

This above inventory file is for two datanode.

Let’s check the complete structure of our workspace

you can also check playbook on github

https://github.com/sabir69261/hadoop-Ansible.git

Thanks for reading

If you find any issue or you want to improve this you can connect me on

Gmail