Categories
SysOps

How to create Hadoop data nodes whitelist

Create Hadoop data nodes whitelist.

Display HDFS report.

$ hdfs dfsadmin -report
Configured Capacity: 63010750464 (58.68 GB)
Present Capacity: 52264609737 (48.68 GB)
DFS Remaining: 49384169472 (45.99 GB)
DFS Used: 2880440265 (2.68 GB)
DFS Used%: 5.51%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0
Erasure Coded Block Groups: 
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.8.173:9866 (datanode1.example.org)
Hostname: datanode1.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 435533613 (415.36 MB)
Non DFS Used: 2481141971 (2.31 GB)
DFS Remaining: 16996388864 (15.83 GB)
DFS Used%: 2.07%
DFS Remaining%: 80.92%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:15:46 UTC 2021
Last Block Report: Sun Jun 06 22:14:10 UTC 2021
Num of Blocks: 3297


Name: 192.168.8.174:9866 (datanode2.example.org)
Hostname: datanode2.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 1431298048 (1.33 GB)
Non DFS Used: 2471190528 (2.30 GB)
DFS Remaining: 16010575872 (14.91 GB)
DFS Used%: 6.81%
DFS Remaining%: 76.23%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:15:46 UTC 2021
Last Block Report: Sun Jun 06 22:14:10 UTC 2021
Num of Blocks: 10879


Name: 192.168.8.175:9866 (datanode3.example.org)
Hostname: datanode3.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 1013608604 (966.65 MB)
Non DFS Used: 2522251108 (2.35 GB)
DFS Remaining: 16377204736 (15.25 GB)
DFS Used%: 4.83%
DFS Remaining%: 77.97%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:15:45 UTC 2021
Last Block Report: Sun Jun 06 22:14:58 UTC 2021
Num of Blocks: 7582

Create a hosts.include file on a namenode.

$ cat <<EOF | sudo -u hadoop tee /opt/hadoop/hadoop-3.2.2/etc/hadoop/hosts.include
datanode1.example.org
datanode2.example.org
EOF

Define a dfs.hosts option inside hdfs-site.xml on a namenode.

$ sudo -u hadoop vim /opt/hadoop/hadoop-3.2.2/etc/hadoop/hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.name.dir</name>
                <value>/opt/hadoop/local_data/namenode</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>https://secondarynamenode.example.org:9870</value>
        </property>

        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.hosts</name>
                <value>/opt/hadoop/hadoop-3.2.2/etc/hadoop/hosts.include</value>
        </property>
</configuration>

Restart service on a namenode.

$ sudo systemctl restart hadoop-namenode.service

Inspect data nodes.

$ hdfs dfsadmin -report 
Configured Capacity: 42007166976 (39.12 GB)
Present Capacity: 34873788205 (32.48 GB)
DFS Remaining: 33006956544 (30.74 GB)
DFS Used: 1866831661 (1.74 GB)
DFS Used%: 5.35%
Replicated Blocks:
        Under replicated blocks: 7578
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 7578
        Pending deletion blocks: 0
Erasure Coded Block Groups: 
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.8.173:9866 (datanode1.example.org)
Hostname: datanode1.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 435533613 (415.36 MB)
Non DFS Used: 2481146067 (2.31 GB)
DFS Remaining: 16996384768 (15.83 GB)
DFS Used%: 2.07%
DFS Remaining%: 80.92%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:19:16 UTC 2021
Last Block Report: Sun Jun 06 22:18:49 UTC 2021
Num of Blocks: 3299


Name: 192.168.8.174:9866 (datanode2.example.org)
Hostname: datanode2.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 1431298048 (1.33 GB)
Non DFS Used: 2471194624 (2.30 GB)
DFS Remaining: 16010571776 (14.91 GB)
DFS Used%: 6.81%
DFS Remaining%: 76.23%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:19:16 UTC 2021
Last Block Report: Sun Jun 06 22:18:49 UTC 2021
Num of Blocks: 10879

Append a data node that needs to be whitelisted to the above-mentioned file (one per line) on a namenode.

$ echo datanode3.example.org | sudo -u hadoop tee -a /opt/hadoop/hadoop-3.2.2/etc/hadoop/hosts.include

Re-read the include file on a namenode and ensure that service is running on the whitelisted data node.

$ hdfs dfsadmin -refreshNodes
Refresh nodes successful

Display HDFS report.

$ hdfs dfsadmin -report
Configured Capacity: 63010750464 (58.68 GB)
Present Capacity: 52258835796 (48.67 GB)
DFS Remaining: 49377067008 (45.99 GB)
DFS Used: 2881768788 (2.68 GB)
DFS Used%: 5.51%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0
Erasure Coded Block Groups: 
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.8.173:9866 (datanode1.example.org)
Hostname: datanode1.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 449625849 (428.80 MB)
Non DFS Used: 2480767239 (2.31 GB)
DFS Remaining: 16982671360 (15.82 GB)
DFS Used%: 2.14%
DFS Remaining%: 80.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:23:16 UTC 2021
Last Block Report: Sun Jun 06 22:18:49 UTC 2021
Num of Blocks: 3455


Name: 192.168.8.174:9866 (datanode2.example.org)
Hostname: datanode2.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 1418534335 (1.32 GB)
Non DFS Used: 2470548033 (2.30 GB)
DFS Remaining: 16023982080 (14.92 GB)
DFS Used%: 6.75%
DFS Remaining%: 76.29%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:23:16 UTC 2021
Last Block Report: Sun Jun 06 22:18:49 UTC 2021
Num of Blocks: 10721


Name: 192.168.8.175:9866 (datanode3.example.org)
Hostname: datanode3.example.org
Decommission Status : Normal
Configured Capacity: 21003583488 (19.56 GB)
DFS Used: 1013608604 (966.65 MB)
Non DFS Used: 2529042276 (2.36 GB)
DFS Remaining: 16370413568 (15.25 GB)
DFS Used%: 4.83%
DFS Remaining%: 77.94%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jun 06 22:23:15 UTC 2021
Last Block Report: Sun Jun 06 22:23:09 UTC 2021
Num of Blocks: 7582

You can exclude hosts using dfs.hosts.exclude option which will have higher priority over this whitelist.