Configure Hadoop Cluster Using Ansible Playbook.
What is Hadoop ?
Hadoop is a framework that allows you to first store Big Data in a distributed environment, so that, you can process it parallely.
What is Ansible ?
Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code.
What is the Meaning of Cluster ?
A cluster is a group of inter-connected computers that work together to perform computationally intensive tasks. In a cluster, each computer is referred to as a “node”. A cluster has a small number of “head nodes”, usually one or two, and a large number of “compute nodes”.
Let’s write the playbook for configuring the Hadoop Cluster -
So here for configuring the Hadoop Cluster we need minimum two system , One will be namenode ( In hadoop your master node known as namenode ) and second will be your datanode. ( the system which will connect with your namenode known as datanode ).
Let’s write playbook to configure the namenode :-
This above playbook will configure the namenode for you , before run this playbook we have to go our inventory file and update it.
In my case the inventory file location is
vim /etc/ip.txt
[namenode]
IP_of_sysem(192.168.43.28) ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh
save the inventory file and run the playbook
ansible-playbook namenode.yml
After run the playbook your namenode will configure.
Let’s write the playbook for datanode :-
This above playbook configure the datanode for you , same thing we have to do here before running this playbook update the inventory file
vim /etc/ip.txt
[namenode]
IP_of_System(192.168.43.28) ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh[datanode]
IP_of_System(192.168.43.83) ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh
Save it and run your playbook
ansible-playbook datanode.yml
If you want more than one datanode you can update the detail of that system in your inventory
vim /etc/ip.txt
[namenode]
IP_of_System(192.168.43.28) ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh[datanode]
IP_of_System(192.168.43.83) ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=sshIP_of_System(192.168.43.83) ansible_user=username ansible_ssh_pass=password ansible_ssh_connection=ssh
This above inventory file is for two datanode.
Let’s check the complete structure of our workspace
you can also check playbook on github
https://github.com/sabir69261/hadoop-Ansible.git
Thanks for reading
If you find any issue or you want to improve this you can connect me on