Categories
SysOps

How to configure Hadoop topology mapping

Configure Hadoop topology mapping using TXT DNS records.

By default, the whole cluster is placed in a single rack, but creating a discovery shell script will ensure that HDFS block placement will use rack awareness for fault tolerance.

$ hdfs dfsadmin -printTopology
Rack: /default-rack
   192.168.8.173:9866 (datanode1.example.org)
   192.168.8.174:9866 (datanode2.example.org)
   192.168.8.175:9866 (datanode3.example.org)

I will assume that datanode1 and datanode3 are located in rack red, datanode2 in rack blue.

Ensure that reverse lookups for mapping addresses to names work as expected.

$ dig +short -x 192.168.8.173
datanode1.example.org.
$ dig +short -x 192.168.8.174
datanode2.example.org.
$ dig +short -x 192.168.8.175
datanode3.example.org.

Configure DNS TXT records to return the appropriate rack for each datanode.

$ dig +short _location.datanode1.example.org. TXT
"red"
$ dig +short _location.datanode2.example.org. TXT
"blue"
$ dig +short _location.datanode3.example.org. TXT
"red"

Configure core-site.xml on namenode.

        <property>
                <name>net.topology.node.switch.mapping.impl</name>
                <value>org.apache.hadoop.net.ScriptBasedMapping</value>
        </property>
        <property>
                <name>net.topology.script.file.name</name>
                <value>/opt/hadoop/hadoop-3.2.2/bin/topology.sh</value>
        </property>
        <property>
                <name>net.topology.script.number.arg</name>
                <value>1</value>
        </property>

Create a discovery shell script on namenode that will return the rack for each IP address.

cat <<EOF | sudo tee /opt/hadoop/hadoop-3.2.2/bin/topology.sh
#!/bin/bash
dig +short -x \$1 | \
        xargs -I{} dig +short TXT _location.{} | tr -d '"' | \
        awk 'END{if(NR==0){print "/rack-default"} else {print "/rack-"\$1}}'
EOF

Ensure that the executable bit is set.

$ sudo chmod +x /opt/hadoop/hadoop-3.2.2/bin/topology.sh

Inspect results.

$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.173
/rack-red
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.174
/rack-blue
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.175
/rack-red
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.179
/rack-default

Retart namenode service.

$ sudo systemctl restart hadoop-namenode.service 

Verify topology.

$ hdfs dfsadmin -printTopology
Rack: /rack-blue
   192.168.8.174:9866 (datanode2.example.org)

Rack: /rack-red
   192.168.8.173:9866 (datanode1.example.org)
   192.168.8.175:9866 (datanode3.example.org)

Additional notes

Rack Awareness