Configure Hadoop and start cluster services using Ansible Playbook

Mukul Jeveriya
4 min readJan 17, 2022

--

Lets start….!!!

For Hadoop cluster first we need to launch One Namenode and atleast one datanode. I use aws instance for Namenode and Datanode.

Go through to my previous blog to know how to launch ec2 instance via ansible.

ansible-playbook setup.yml

Successfully namenode and datanode instances created. Now create the playbook for namenode and datanode setuping.

Creating Playbook

Create a directory as your workspace, for example mkdir task
Inside this workspace, create a playbook (extension .yml), for example,
vim hadoop.yml.In this task,i created two seperate playbooks for namenode and datanode.

Namenode.yml

- hosts: namenode
vars:
name_folder_name: "/namenode"
tasks:


- name: "Check if hadoop software already exist or not"
stat:
path: "/root/hadoop-1.2.1-1.x86_64.rpm"
register: check_hadoop_presence


- name: "Copying hadoop software"
copy:
dest: "/root/"
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
when: not check_hadoop_presence.stat.exists


- name: "Check if jdk software already exist or not"
stat:
path: "/root/jdk-8u171-linux-x64.rpm"
register: check_jdk_presence

- name: "Copying jdk software"
copy:
dest: "/root"
src: "/root/jdk-8u171-linux-x64.rpm"

when: not check_jdk_presence.stat.exists

- name: " Check java version"
shell: "java -version"
register: check_java
changed_when: False

- name: "Installing jdk"
shell: "rpm -ih jdk-8u171-linux-x64.rpm"
when: "check_java.rc > 0"

- name: " Check hadoop version"
shell: "hadoop version"
register: check_hadoop
changed_when: False

- name: "Installing hadoop"
shell: "rpm -ih hadoop-1.2.1-1.x86_64.rpm --force"
when: "check_hadoop.rc > 0"

- name: "Check if folder already exist or not"
stat:
path: "{{ name_folder_name }}"
register: check_folder

- name: "Make directory for datanode"
file:
path: "{{ name_folder_name }}"
state: directory
when: not check_folder.stat.exists

- name: "Configure hdfs-site.xml file"
template:
src: "hdfs-site.xml"
dest: "/etc/hadoop/hdfs-site.xml"


- name: "Configure core-site.xml file"
template:
dest: "/etc/hadoop/core-site.xml"
src: "core.xml"

- name: "jps command"
command: "jps"
register: jps_check

- name: "Starting hadoop service"
command: "hadoop-daemon.sh start namenode"
when: not jps_check.stdout_lines[1]

- debug:
var: jps_check.stdout_lines

datanode.yml

- hosts: datanode
vars:
data_folder_name: "/dn"
tasks:


- name: "Check if hadoop software already exist or not"
stat:
path: "/root/hadoop-1.2.1-1.x86_64.rpm"
register: check_hadoop_presence


- name: "Copying hadoop software"
copy:
dest: "/root/"
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
when: not check_hadoop_presence.stat.exists


- name: "Check if jdk software already exist or not"
stat:
path: "/root/jdk-8u171-linux-x64.rpm"
register: check_jdk_presence

- name: "Copying jdk software"
copy:
dest: "/root"
src: "/root/jdk-8u171-linux-x64.rpm"

when: not check_jdk_presence.stat.exists

- name: " Check java version"
shell: "java -version"
register: check_java_data
ignore_errors: true

- name: "Installing jdk"
shell: "rpm -ih jdk-8u171-linux-x64.rpm"
when: check_java_data.rc > 0


- name: " Check hadoop version"
shell: "hadoop version"
register: check_hadoop_data
ignore_errors: true

- name: "Installing hadoop"
shell: "rpm -ih hadoop-1.2.1-1.x86_64.rpm --force"
when: "check_hadoop_data.rc > 0"

- name: "Check if folder already exist or not"
stat:
path: "{{ data_folder_name }}"
register: check_folder

- name: "Make directory for datanode"
file:
path: "{{ data_folder_name }}"
state: directory
when: not check_folder.stat.exists

- name: "Configure hdfs-site.xml file"
template:
src: "hdfs_data.xml"
dest: "/etc/hadoop/hdfs-site.xml"


- name: "Configure core-site.xml file"
template:
dest: "/etc/hadoop/core-site.xml"
src: "core_data.xml"

- name: "jps command"
command: "jps"
register: jps_check_data

- name: "Starting hadoop service"
command: "hadoop-daemon.sh start datanode"
ignore_errors: true
when: jps_check_data.stdout_lines[0][1]
register: hadoop_service

- name: "Check hadoop service status"
command: "hadoop-daemon.sh start datanode"
ignore_errors: true
when: hadoop_service.changed == false

- debug:
var: hadoop_service


- name: "jps command"
command: "jps"
register: jps_check_data1

- debug:
var: jps_check_data1

- name: "Checking connectivity b/w namenode and datanode"
command: "hadoop dfsadmin -report"
register: hadoop_report

- debug:
var: hadoop_report

When the playbook for configuring data node will run the following output will produced

Finally, can see the output of the <hadoop dfsadmin -report> command at console only without going to any managed node.

Thanks for reading and i hope you will like the Blog!!!

--

--