Configure Hadoop and start cluster services using Ansible Playbook

Lets start….!!!

For Hadoop cluster first we need to launch One Namenode and atleast one datanode. I use aws instance for Namenode and Datanode.

Go through to my previous blog to know how to launch ec2 instance via ansible.

ansible-playbook setup.yml

Successfully namenode and datanode instances created. Now create the playbook for namenode and datanode setuping.

Creating Playbook

Create a directory as your workspace, for example mkdir task
Inside this workspace, create a playbook (extension .yml), for example,
vim hadoop.yml.In this task,i created two seperate playbooks for namenode and datanode.

Namenode.yml

- hosts: namenode
vars:
name_folder_name: "/namenode"
tasks:


- name: "Check if hadoop software already exist or not"
stat:
path: "/root/hadoop-1.2.1-1.x86_64.rpm"
register: check_hadoop_presence


- name: "Copying hadoop software"
copy:
dest: "/root/"
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
when: not check_hadoop_presence.stat.exists


- name: "Check if jdk software already exist or not"
stat:
path: "/root/jdk-8u171-linux-x64.rpm"
register: check_jdk_presence

- name: "Copying jdk software"
copy:
dest: "/root"
src: "/root/jdk-8u171-linux-x64.rpm"

when: not check_jdk_presence.stat.exists

- name: " Check java version"
shell: "java -version"
register: check_java
changed_when: False

- name: "Installing jdk"
shell: "rpm -ih jdk-8u171-linux-x64.rpm"
when: "check_java.rc > 0"

- name: " Check hadoop version"
shell: "hadoop version"
register: check_hadoop
changed_when: False

- name: "Installing hadoop"
shell: "rpm -ih hadoop-1.2.1-1.x86_64.rpm --force"
when: "check_hadoop.rc > 0"

- name: "Check if folder already exist or not"
stat:
path: "{{ name_folder_name }}"
register: check_folder

- name: "Make directory for datanode"
file:
path: "{{ name_folder_name }}"
state: directory
when: not check_folder.stat.exists

- name: "Configure hdfs-site.xml file"
template:
src: "hdfs-site.xml"
dest: "/etc/hadoop/hdfs-site.xml"


- name: "Configure core-site.xml file"
template:
dest: "/etc/hadoop/core-site.xml"
src: "core.xml"

- name: "jps command"
command: "jps"
register: jps_check

- name: "Starting hadoop service"
command: "hadoop-daemon.sh start namenode"
when: not jps_check.stdout_lines[1]

- debug:
var: jps_check.stdout_lines

datanode.yml

- hosts: datanode
vars:
data_folder_name: "/dn"
tasks:


- name: "Check if hadoop software already exist or not"
stat:
path: "/root/hadoop-1.2.1-1.x86_64.rpm"
register: check_hadoop_presence


- name: "Copying hadoop software"
copy:
dest: "/root/"
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
when: not check_hadoop_presence.stat.exists


- name: "Check if jdk software already exist or not"
stat:
path: "/root/jdk-8u171-linux-x64.rpm"
register: check_jdk_presence

- name: "Copying jdk software"
copy:
dest: "/root"
src: "/root/jdk-8u171-linux-x64.rpm"

when: not check_jdk_presence.stat.exists

- name: " Check java version"
shell: "java -version"
register: check_java_data
ignore_errors: true

- name: "Installing jdk"
shell: "rpm -ih jdk-8u171-linux-x64.rpm"
when: check_java_data.rc > 0


- name: " Check hadoop version"
shell: "hadoop version"
register: check_hadoop_data
ignore_errors: true

- name: "Installing hadoop"
shell: "rpm -ih hadoop-1.2.1-1.x86_64.rpm --force"
when: "check_hadoop_data.rc > 0"

- name: "Check if folder already exist or not"
stat:
path: "{{ data_folder_name }}"
register: check_folder

- name: "Make directory for datanode"
file:
path: "{{ data_folder_name }}"
state: directory
when: not check_folder.stat.exists

- name: "Configure hdfs-site.xml file"
template:
src: "hdfs_data.xml"
dest: "/etc/hadoop/hdfs-site.xml"


- name: "Configure core-site.xml file"
template:
dest: "/etc/hadoop/core-site.xml"
src: "core_data.xml"

- name: "jps command"
command: "jps"
register: jps_check_data

- name: "Starting hadoop service"
command: "hadoop-daemon.sh start datanode"
ignore_errors: true
when: jps_check_data.stdout_lines[0][1]
register: hadoop_service

- name: "Check hadoop service status"
command: "hadoop-daemon.sh start datanode"
ignore_errors: true
when: hadoop_service.changed == false

- debug:
var: hadoop_service


- name: "jps command"
command: "jps"
register: jps_check_data1

- debug:
var: jps_check_data1

- name: "Checking connectivity b/w namenode and datanode"
command: "hadoop dfsadmin -report"
register: hadoop_report

- debug:
var: hadoop_report

When the playbook for configuring data node will run the following output will produced

Finally, can see the output of the <hadoop dfsadmin -report> command at console only without going to any managed node.

Thanks for reading and i hope you will like the Blog!!!

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The FOSS Contributor Fund at Indeed

FOSS Contributor Fund logo

Flash BTC Transaction (Core Network) Full Version 7.0.0 full software

A Data Pattern with an R data.table Solution

Getting and Managing Location Updates

Build and Secure Networks in Google Cloud: Challenge Lab

Hacking the future of work [and policy!] in Ottawa

Scaling a Spring App to Zero

Rest API Calls Made Easy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mukul Jeveriya

Mukul Jeveriya

More from Medium

Logging Apache Geode Client Queues

Elasticsearch 7.16 is there: What’s new?

Using Minix with docker-compose

mTLS with Apache HTTP server