Configure Hadoop and start cluster services using Ansible Playbook

4 min readJan 17, 2022

Lets start….!!!

For Hadoop cluster first we need to launch One Namenode and atleast one datanode. I use aws instance for Namenode and Datanode.

Go through to my previous blog to know how to launch ec2 instance via ansible.

Deploy Web Server on AWS through ANSIBLE!

Deploying a HTTP Server over AWS EC2 instance using Ansible automation.

mukuljeveriya.medium.com

ansible-playbook setup.yml

Successfully namenode and datanode instances created. Now create the playbook for namenode and datanode setuping.

Creating Playbook

Create a directory as your workspace, for example mkdir taskInside this workspace, create a playbook (extension .yml), for example,vim hadoop.yml.In this task,i created two seperate playbooks for namenode and datanode.

Namenode.yml

- hosts: namenode
  vars:
         name_folder_name: "/namenode"
  tasks:
 

  - name: "Check if hadoop software already exist or not"
    stat: 
     path: "/root/hadoop-1.2.1-1.x86_64.rpm"
    register: check_hadoop_presence
    

  - name: "Copying hadoop software"
    copy:
            dest: "/root/"
            src: "/root/hadoop-1.2.1-1.x86_64.rpm"
    when: not check_hadoop_presence.stat.exists
            

  - name: "Check if jdk software already exist or not"
    stat: 
     path: "/root/jdk-8u171-linux-x64.rpm"
    register: check_jdk_presence

  - name: "Copying jdk software"
    copy:
            dest: "/root"
            src: "/root/jdk-8u171-linux-x64.rpm"

    when: not check_jdk_presence.stat.exists

  - name: " Check java version"
    shell: "java -version"
    register: check_java
    changed_when: False

  - name: "Installing jdk"
    shell: "rpm -ih jdk-8u171-linux-x64.rpm"
    when: "check_java.rc > 0" 

  - name: " Check hadoop version"
    shell: "hadoop version"
    register: check_hadoop
    changed_when: False

  - name: "Installing hadoop"
    shell: "rpm -ih hadoop-1.2.1-1.x86_64.rpm --force"
    when: "check_hadoop.rc > 0"

  - name: "Check if folder  already exist or not"
    stat: 
     path: "{{ name_folder_name }}"
    register: check_folder

  - name: "Make directory for datanode"
    file:
            path: "{{ name_folder_name }}"
            state: directory
    when: not check_folder.stat.exists

  - name: "Configure hdfs-site.xml file"
    template:
            src: "hdfs-site.xml"
            dest: "/etc/hadoop/hdfs-site.xml"
            
    
  - name: "Configure core-site.xml file"
    template:
            dest: "/etc/hadoop/core-site.xml"
            src: "core.xml"

  - name: "jps command"
    command: "jps"
    register: jps_check

  - name: "Starting hadoop service"
    command: "hadoop-daemon.sh start namenode"
    when: not jps_check.stdout_lines[1]
  
  - debug:
        var: jps_check.stdout_lines

datanode.yml

- hosts: datanode
  vars:
          data_folder_name: "/dn"
  tasks:
 

  - name: "Check if hadoop software already exist or not"
    stat: 
     path: "/root/hadoop-1.2.1-1.x86_64.rpm"
    register: check_hadoop_presence
    

  - name: "Copying hadoop software"
    copy:
            dest: "/root/"
            src: "/root/hadoop-1.2.1-1.x86_64.rpm"
    when: not check_hadoop_presence.stat.exists
            

  - name: "Check if jdk software already exist or not"
    stat: 
     path: "/root/jdk-8u171-linux-x64.rpm"
    register: check_jdk_presence

  - name: "Copying jdk software"
    copy:
            dest: "/root"
            src: "/root/jdk-8u171-linux-x64.rpm"

    when: not check_jdk_presence.stat.exists

  - name: " Check java version"
    shell: "java -version"
    register: check_java_data
    ignore_errors: true

  - name: "Installing jdk"
    shell: "rpm -ih jdk-8u171-linux-x64.rpm"
    when: check_java_data.rc > 0 


  - name: " Check hadoop version"
    shell: "hadoop version"
    register: check_hadoop_data
    ignore_errors: true

  - name: "Installing hadoop"
    shell: "rpm -ih hadoop-1.2.1-1.x86_64.rpm --force"
    when: "check_hadoop_data.rc > 0"

  - name: "Check if folder  already exist or not"
    stat: 
     path: "{{ data_folder_name }}"
    register: check_folder

  - name: "Make directory for datanode"
    file:
            path: "{{ data_folder_name }}"
            state: directory
    when: not check_folder.stat.exists

  - name: "Configure hdfs-site.xml file"
    template:
            src: "hdfs_data.xml"
            dest: "/etc/hadoop/hdfs-site.xml"
            
    
  - name: "Configure core-site.xml file"
    template:
            dest: "/etc/hadoop/core-site.xml"
            src: "core_data.xml"

  - name: "jps command"
    command: "jps"
    register: jps_check_data

  - name: "Starting hadoop service"
    command: "hadoop-daemon.sh start datanode"
    ignore_errors: true
    when:  jps_check_data.stdout_lines[0][1]
    register: hadoop_service

  - name: "Check hadoop service status"
    command: "hadoop-daemon.sh start datanode"
    ignore_errors: true
    when: hadoop_service.changed == false

  - debug:
          var: hadoop_service


  - name: "jps command"
    command: "jps"
    register: jps_check_data1

  - debug:
        var: jps_check_data1

  - name: "Checking connectivity b/w namenode and datanode"
    command: "hadoop dfsadmin -report"
    register: hadoop_report

  - debug:
          var: hadoop_report

When the playbook for configuring data node will run the following output will produced

Finally, can see the output of the <hadoop dfsadmin -report> command at console only without going to any managed node.

Thanks for reading and i hope you will like the Blog!!!

Configure Hadoop and start cluster services using Ansible Playbook

Lets start….!!!

Deploy Web Server on AWS through ANSIBLE!

Deploying a HTTP Server over AWS EC2 instance using Ansible automation.

Creating Playbook

Written by Mukul Jeveriya