Configure Hadoop topology mapping using TXT DNS records.
By default, the whole cluster is placed in a single rack, but creating a discovery shell script will ensure that HDFS block placement will use rack awareness for fault tolerance.
$ hdfs dfsadmin -printTopology
Rack: /default-rack 192.168.8.173:9866 (datanode1.example.org) 192.168.8.174:9866 (datanode2.example.org) 192.168.8.175:9866 (datanode3.example.org)
I will assume that datanode1
and datanode3
are located in rack red
, datanode2
in rack blue
.
Ensure that reverse lookups for mapping addresses to names work as expected.
$ dig +short -x 192.168.8.173
datanode1.example.org.
$ dig +short -x 192.168.8.174
datanode2.example.org.
$ dig +short -x 192.168.8.175
datanode3.example.org.
Configure DNS TXT records to return the appropriate rack for each datanode.
$ dig +short _location.datanode1.example.org. TXT
"red"
$ dig +short _location.datanode2.example.org. TXT
"blue"
$ dig +short _location.datanode3.example.org. TXT
"red"
Configure core-site.xml
on namenode.
<property> <name>net.topology.node.switch.mapping.impl</name> <value>org.apache.hadoop.net.ScriptBasedMapping</value> </property> <property> <name>net.topology.script.file.name</name> <value>/opt/hadoop/hadoop-3.2.2/bin/topology.sh</value> </property> <property> <name>net.topology.script.number.arg</name> <value>1</value> </property>
Create a discovery shell script on namenode that will return the rack for each IP address.
cat <<EOF | sudo tee /opt/hadoop/hadoop-3.2.2/bin/topology.sh #!/bin/bash dig +short -x \$1 | \ xargs -I{} dig +short TXT _location.{} | tr -d '"' | \ awk 'END{if(NR==0){print "/rack-default"} else {print "/rack-"\$1}}' EOF
Ensure that the executable bit is set.
$ sudo chmod +x /opt/hadoop/hadoop-3.2.2/bin/topology.sh
Inspect results.
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.173
/rack-red
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.174
/rack-blue
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.175
/rack-red
$ /opt/hadoop/hadoop-3.2.2/bin/topology.sh 192.168.8.179
/rack-default
Retart namenode service.
$ sudo systemctl restart hadoop-namenode.service
Verify topology.
$ hdfs dfsadmin -printTopology
Rack: /rack-blue 192.168.8.174:9866 (datanode2.example.org) Rack: /rack-red 192.168.8.173:9866 (datanode1.example.org) 192.168.8.175:9866 (datanode3.example.org)