Automate your Hadoop Cluster (CDH) – Part 2

Automate your Hadoop Cluster (CDH) – Part 2

Automate your Hadoop Cluster (CDH) – Part 2

Our Part -1 goal was to install and configure a Cloudera Manager and configure repository from command line.

Our Part -2 goal is to install and configure a CDH services using CM API.

  • Part -1 Install Cloudera Manager (cluster) like a boss.
  • ** Part -2 Add services and Configure CDH using API **
  • Part -2 Secure your CDH Cloudera server with Kerberos
  • Part -3 Configure TLS Encryption on CDH services

Create CDH Services Using API

  • Install jq tools

    jq is great tool to parse and json documents using unix shell. We will use jq tools to create json contents

    For more information on jq can be read at

    https://stedolan.github.io/jq

    Use the following command to install jq

sudo yum install epel-release
sudo yum install jq
  • Start Trial
basicauth=admin:admin
apiURL=http://$(hostname -f):7180/api/v19
curl -i -X POST -u "${basicauth}"  ${apiURL}/cm/trial/begin
  • Register all Data node hosts with Cloudera Manager
basicauth=admin:admin 
apiURL=http://$(hostname -f):7180/api/v19

export DATA_NODE_HOSTS='"data-1.example.com", "data-2.example.com", "data-2.example.com"'
PRIVATE_KEY=$(sed 's/$/\\n/' ~cloudera/.ssh/id_rsa | paste -sd '' -)
echo '{ "hostNames": ['${DATA_NODE_HOSTS}'],  
 "userName" : "cloudera", "privateKey":"'$PRIVATE_KEY'",  
 "unlimitedJCE":"true", "javaInstallStrategy":"NONE" }' > /tmp/hostInstall.json

curl -i -X POST -u "${basicauth}" -H "content-type:application/json" -d @'/tmp/hostInstall.json'    ${apiURL}/cm/commands/hostInstall

Response should look like

HTTP/1.1 200 OK
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Set-Cookie: CLOUDERA_MANAGER_SESSIONID=7j26djm83ly1usy2rl2sbj89;Path=/;HttpOnly
Content-Type: application/json
Date: Sat, 22 Dec 2018 19:23:05 GMT
Transfer-Encoding: chunked
Server: Jetty(6.1.26.cloudera.4)

{
  "id" : 12,
  "name" : "GlobalHostInstall",
  "startTime" : "2018-12-22T19:23:05.009Z",
  "active" : true,
  "children" : {
    "items" : [ ]
  }
}
  • Check for Installed hosts
basicauth=admin:admin 
apiURL=http://$(hostname -f):7180/api/v19

curl -i -X GET -u "${basicauth}" -H "content-type:application/json"     ${apiURL}/hosts |grep "hostname"

Wait until all hosts are installed.

  • Install Cloudera Management Service
export basicauth=admin:admin
export apiURL=http://$(hostname -f):7180/api/v19
export POSTGRES_SERVER=cm.example.com
curl -i -X PUT -u "${basicauth}" -i -H "content-type:application/json" -d '{ "name": "mgmt" ,"displayName":"Cloudere Management Service"}' ${apiURL}/cm/service
curl -i -X PUT -u "${basicauth}" ${apiURL}/cm/service/autoAssignRoles  
curl -i -X PUT -u "${basicauth}" ${apiURL}/cm/service/autoConfigure

curl -u "${basicauth}" \  
-H "Content-Type: application/json" \  
-X POST \  
--data '{}' \  
${apiURL}/cm/service/commands/restart
  • Configure reporting database
curl -i -X PUT -u "${basicauth}"  \
      -H "content-type:application/json" \
      -d '{ "items": [{"name": "headlamp_database_host",     "value": "'${POSTGRES_SERVER}'"},
                      {"name": "headlamp_database_name",     "value": "reportman"},
                      {"name": "headlamp_database_password", "value": "averycomplexpassword"},
                      {"name": "headlamp_database_user",     "value": "reportman"},
                      {"name": "headlamp_database_type",     "value": "postgresql"}
          ]}'  \
    ${apiURL}/cm/service/roleConfigGroups/mgmt-REPORTSMANAGER-BASE/config
  • Delete Navigator Entry
curl -X DELETE -u "${basicauth}"  \
    ${apiURL}/cm/service/roles/$(curl -sS -X GET -u "${basicauth}"  ${apiURL}/cm/service/roles | grep -B1 '"type" : "NAVIGATORMETASERVER"' | grep name | cut -d'"' -f4)

curl -X DELETE -u "${basicauth}"  \
    ${apiURL}/cm/service/roles/$(curl -sS -X GET -u "${basicauth}"  ${apiURL}/cm/service/roles | grep -B1 '"type" : "NAVIGATOR"' | grep name | cut -d'"' -f4)

  • Create Service cluster and add hosts
export clusterName=cdhcluster-01
curl -X POST -u "${basicauth}"  \
      -H "content-type:application/json" \
      -d '{ "items": [
              {
                "name": "'${clusterName}'",
                "version": "CDH5",
                "fullVersion":"5.13.1"
              }
          ] }'  \
 ${apiURL}/clusters

# Add hosts 
hostIds=$(curl -sS -X GET -u ${basicauth}  ${apiURL}/hosts |grep "hostId" | cut -d'"' -f4)
for hostid in ${hostIds};do
   curl -X POST -u "${basicauth}"  \
    -H 'content-type:application/json' \
    -d '{ "items": [ {"hostId": "'${hostid}'"} ]}'  \
  ${apiURL}/clusters/${clusterName}/hosts
done  
  • Distribute CDH Parcel
# define a function
wait_for_parcel () {
  wait_for=$1
  service=$2
  version=$3
  while [ 1 ]
  do
    curl -sS -X GET -u "${basicauth}"  ${apiURL}/clusters/${clusterName}/parcels/products/$service/versions/$version | grep '"stage" : "'$wait_for'"' && break
    sleep 5
  done
}
# Distribute parcel
service=CDH
PARCEL_VERSION=5.13.1-1.cdh5.13.1.p0.2
curl -i -X POST -u "${basicauth}"  ${apiURL}/clusters/${clusterName}/parcels/products/${service}/versions/${PARCEL_VERSION}/commands/startDistribution
# Check Status until DISTRIBUTED
wait_for_parcel DISTRIBUTED ${service} ${PARCEL_VERSION}
# Activate 
curl -i -X POST -u "${basicauth}"  ${apiURL}/clusters/${clusterName}/parcels/products/${service}/versions/${PARCEL_VERSION}/commands/activate
wait_for_parcel ACTIVATED $service $PARCEL_VERSION
  • Distribute KAFKA Parcel
# define a function
wait_for_parcel () {
  wait_for=$1
  service=$2
  version=$3
  while [ 1 ]
  do
    curl -sS -X GET -u "${basicauth}"  ${apiURL}/clusters/${clusterName}/parcels/products/$service/versions/$version | grep '"stage" : "'$wait_for'"' && break
    sleep 5
  done
}
# Distribute parcel
service=KAFKA
PARCEL_VERSION=3.1.0-1.3.1.0.p0.35
curl -i -X POST -u "${basicauth}"  ${apiURL}/clusters/${clusterName}/parcels/products/${service}/versions/${PARCEL_VERSION}/commands/startDistribution
# Check Status until DISTRIBUTED
wait_for_parcel DISTRIBUTED ${service} ${PARCEL_VERSION}
# Activate 
curl -i -X POST -u "${basicauth}"  ${apiURL}/clusters/${clusterName}/parcels/products/${service}/versions/${PARCEL_VERSION}/commands/activate
wait_for_parcel ACTIVATED $service $PARCEL_VERSION
  • Create CDH Services
export basicauth=admin:admin
export apiURL=http://$(hostname -f):7180/api/v19
export clusterName=cdhcluster-01

curl -X POST -u "${basicauth}" \
-H "content-type:application/json" \
-d '{ "items": [  {"name": "zookeeper", "type": "ZOOKEEPER","displayName":"Zookeeper"}, {"name": "yarn", "type": "YARN","displayName":"YARN (MR2 Included)"}, {"name": "hdfs", "type": "HDFS","displayName":"HDFS"}, {"name": "kafka", "type": "KAFKA","displayName":"Kafka"}, {"name": "hbase", "type": "HBASE","displayName":"HBase"} ] }' \
${apiURL}/clusters/${clusterName}/services

Configure CDH Services Using API

  • HDFS
curl -i -u "${basicauth}" \
      -H "Content-Type: application/json" -i \
      -X PUT \
      --data '{
          "items" : [ {
            "name" : "zookeeper_service",
            "value" : "zookeeper",
            "sensitive" : false
          } ]
        }' \
${apiURL}/clusters/${clusterName}/services/hdfs/config
  • YARN
curl -i -u "${basicauth}" \
     -H "Content-Type: application/json" -i \
     -X PUT \
     --data '{
        "items" : [ {
          "name" : "hdfs_service",
          "value" : "hdfs",
          "sensitive" : false
        }, {
          "name" : "zookeeper_service",
          "value" : "zookeeper",
          "sensitive" : false
        } ]
      }' \
${apiURL}/clusters/${clusterName}/services/yarn/config
  • KAFKA
curl -i -u "${basicauth}" \
     -H "Content-Type: application/json" -i \
     -X PUT \
     --data '{
         "items" : [ {
           "name" : "zookeeper_service",
           "value" : "zookeeper",
           "sensitive" : false
         } ]
       }' \
${apiURL}/clusters/${clusterName}/services/kafka/config
  • HBASE
curl -i -u "${basicauth}" \
     -H "Content-Type: application/json" -i \
     -X PUT \
     --data '{
        "items" : [ {
          "name" : "hdfs_service",
          "value" : "hdfs",
          "sensitive" : false
        }, {
          "name" : "zookeeper_service",
          "value" : "zookeeper",
          "sensitive" : false
        } ]
      }' \
${apiURL}/clusters/${clusterName}/services/hbase/config

Create CDH services Roles

  • Generate hostIds
export basicauth=admin:admin
export apiURL=http://$(hostname -f):7180/api/v19
export clusterName=cdhcluster-01
# save Cloudera host id's for all data node
curl -X GET -u ${basicauth} ${apiURL}/hosts |jq -r '.items[].hostId'  > /tmp/all-hosts-id.txt
# save Cloudera host id's for first  node
# this will be used to configure NODEMAnaGER
curl -X GET -u admin:admin http://$(hostname -f):7180/api/v19/hosts |jq -r '.items[0].hostId' > /tmp/hosts-id-1.txt
  • configure zookeeper roles
export serviceName="zookeeper"  
export roleType="SERVER"
jq -R '.' /tmp/all-hosts-id.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
  • configure hdfs roles
export serviceName="hdfs"  
export roleTypes="NAMENODE SECONDARYNAMENODE BALANCER"
for roleType in ${roleTypes};do
   jq -R '.' /tmp/hosts-id-1.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
   curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
done

export roleTypes="DATANODE"
for roleType in ${roleTypes};do
   jq -R '.' /tmp/all-hosts-id.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
   curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
done
  • configure hdfs directories
curl -i -u "${basicauth}" \
      -H "Content-Type: application/json" -i \
      -X PUT \
      --data '{
              "items" : [ {
                 "name" : "dfs_name_dir_list",
                 "value" : "/dfs/nn",
                 "sensitive" : false
              }  ]
            }' \
${apiURL}/clusters/${clusterName}/services/hdfs/roleConfigGroups/hdfs-NAMENODE-BASE/config

curl -i -u "${basicauth}" \
      -H "Content-Type: application/json" -i \
      -X PUT \
      --data '{
          "items" : [ {
             "name" : "fs_checkpoint_dir_list",
             "value" : "/dfs/snn",
             "sensitive" : false
          }
          ]
        }
        ' \
${apiURL}/clusters/${clusterName}/services/hdfs/roleConfigGroups/hdfs-SECONDARYNAMENODE-BASE/config
curl -i -u "${basicauth}" \
      -H "Content-Type: application/json" -i \
      -X PUT \
      --data '{
          "items" : [ {
            "name" : "dfs_data_dir_list",
            "value" : "/dfs/dn",
            "sensitive" : false
          }
          ]
        }' \
${apiURL}/clusters/${clusterName}/services/hdfs/roleConfigGroups/hdfs-DATANODE-BASE/config


  • configure hbase roles
export serviceName="hbase"  
export roleTypes="MASTER"
for roleType in ${roleTypes};do
   jq -R '.' /tmp/hosts-id-1.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
   curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
done

export roleTypes="REGIONSERVER"
for roleType in ${roleTypes};do
   jq -R '.' /tmp/all-hosts-id.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
   curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
done
  • configure yarn roles
export serviceName="yarn"  
export roleTypes="NODEMANAGER RESOURCEMANAGER JOBHISTORY"
for roleType in ${roleTypes};do
   jq -R '.' /tmp/hosts-id-1.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
   curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
done```

- [ ] configure yarn directories
```curl -i -u "${basicauth}" \
     -H "Content-Type: application/json" -i \
     -X PUT \
             --data '{
         "items" : [ {
           "name" : "yarn_nodemanager_local_dirs",
           "value" : "/yarn/nm",
           "sensitive" : false
         }, {
           "name" : "yarn_nodemanager_log_dirs",
           "value" : "/yarn/container-logs",
           "sensitive" : false
         }]
       }' \
${apiURL}/clusters/${clusterName}/services/yarn/roleConfigGroups/yarn-NODEMANAGER-BASE/config

  • configure kafka roles
export serviceName="kafka"  
export roleTypes="KAFKA_BROKER"
for roleType in ${roleTypes};do
	jq -R '.' /tmp/all-hosts-id.txt | jq -s '{items:map({type:"'$roleType'",hostRef:{hostId:.}})}' > /tmp/items-json-${serviceName}-${roleType}.txt
	curl -i -X POST -u "${basicauth}" ${apiURL}/clusters/${clusterName}/services/${serviceName}/roles  -H "content-type:application/json" -d  @/tmp/items-json-${serviceName}-${roleType}.txt
done

Execute firstRun

  • Deploy Client Config
curl -u ${basicauth} \
    -H "Content-Type:application/json" \
    -X POST  \
    -i \    ${apiURL}/clusters/${clusterName}/commands/deployClientConfig

The output will generate a JSON with command id. In this case it is 51 , but it will vary.

{
  "id" : 51,
  "name" : "DeployClusterClientConfig",
  "startTime" : "2018-12-25T06:35:00.200Z",
  "active" : true,
  "clusterRef" : {
    "clusterName" : "cdhcluster-01"
  }
}

Wait until the command is finished with success .

"success" : true

Following command can be used to check the status

export commandId=51
curl -i -X GET -u ${basicauth} ${apiURL}/commands/${commandId}

First Run

  • zookeeper
export serviceName=zookeeper
curl -X POST -u ${basicauth}  ${apiURL}/clusters/${clusterName}/services/${serviceName}/commands/firstRun

Wait until the command is finished with success .

"success" : true

Following command can be used to check the status

export commandId=<commandId>
curl -i -X GET -u ${basicauth} ${apiURL}/commands/${commandId}
  • hdfs
export serviceName=hdfs
curl -X POST -u ${basicauth}  ${apiURL}/clusters/${clusterName}/services/${serviceName}/commands/firstRun

Wait until the command is finished with success .

"success" : true

Following command can be used to check the status

export commandId=<commandId>
curl -i -X GET -u ${basicauth} ${apiURL}/commands/${commandId}
  • hbase
export serviceName=hbase
curl -X POST -u ${basicauth}  ${apiURL}/clusters/${clusterName}/services/${serviceName}/commands/firstRun

Wait until the command is finished with success .

"success" : true

Following command can be used to check the status

export commandId=<commandId>
curl -i -X GET -u ${basicauth} ${apiURL}/commands/${commandId}
  • yarn
export serviceName=yarn
curl -X POST -u ${basicauth}  ${apiURL}/clusters/${clusterName}/services/${serviceName}/commands/firstRun

Wait until the command is finished with success .

"success" : true

Following command can be used to check the status

export commandId=<commandId>
curl -i -X GET -u ${basicauth} ${apiURL}/commands/${commandId}
  • kafka
export serviceName=kafka
curl -X POST -u ${basicauth}  ${apiURL}/clusters/${clusterName}/services/${serviceName}/commands/firstRun

Wait until the command is finished with success .

"success" : true

Following command can be used to check the status

export commandId=<commandId>
curl -i -X GET -u ${basicauth} ${apiURL}/commands/${commandId}

At this point we should have a working Cloudera cluster. Next part 3 will cover the security configurations using Cloudera CDH API

Leave a comment