Create Hadoop data nodes whitelist.
Display HDFS report.
$ hdfs dfsadmin -report
Configured Capacity: 63010750464 (58.68 GB) Present Capacity: 52264609737 (48.68 GB) DFS Remaining: 49384169472 (45.99 GB) DFS Used: 2880440265 (2.68 GB) DFS Used%: 5.51% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Low redundancy block groups: 0 Block groups with corrupt internal blocks: 0 Missing block groups: 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (3): Name: 192.168.8.173:9866 (datanode1.example.org) Hostname: datanode1.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 435533613 (415.36 MB) Non DFS Used: 2481141971 (2.31 GB) DFS Remaining: 16996388864 (15.83 GB) DFS Used%: 2.07% DFS Remaining%: 80.92% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:15:46 UTC 2021 Last Block Report: Sun Jun 06 22:14:10 UTC 2021 Num of Blocks: 3297 Name: 192.168.8.174:9866 (datanode2.example.org) Hostname: datanode2.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 1431298048 (1.33 GB) Non DFS Used: 2471190528 (2.30 GB) DFS Remaining: 16010575872 (14.91 GB) DFS Used%: 6.81% DFS Remaining%: 76.23% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:15:46 UTC 2021 Last Block Report: Sun Jun 06 22:14:10 UTC 2021 Num of Blocks: 10879 Name: 192.168.8.175:9866 (datanode3.example.org) Hostname: datanode3.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 1013608604 (966.65 MB) Non DFS Used: 2522251108 (2.35 GB) DFS Remaining: 16377204736 (15.25 GB) DFS Used%: 4.83% DFS Remaining%: 77.97% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:15:45 UTC 2021 Last Block Report: Sun Jun 06 22:14:58 UTC 2021 Num of Blocks: 7582
Create a hosts.include
file on a namenode.
$ cat <<EOF | sudo -u hadoop tee /opt/hadoop/hadoop-3.2.2/etc/hadoop/hosts.include datanode1.example.org datanode2.example.org EOF
Define a dfs.hosts
option inside hdfs-site.xml
on a namenode.
$ sudo -u hadoop vim /opt/hadoop/hadoop-3.2.2/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/opt/hadoop/local_data/namenode</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>https://secondarynamenode.example.org:9870</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.hosts</name> <value>/opt/hadoop/hadoop-3.2.2/etc/hadoop/hosts.include</value> </property> </configuration>
Restart service on a namenode.
$ sudo systemctl restart hadoop-namenode.service
Inspect data nodes.
$ hdfs dfsadmin -report
Configured Capacity: 42007166976 (39.12 GB) Present Capacity: 34873788205 (32.48 GB) DFS Remaining: 33006956544 (30.74 GB) DFS Used: 1866831661 (1.74 GB) DFS Used%: 5.35% Replicated Blocks: Under replicated blocks: 7578 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Low redundancy blocks with highest priority to recover: 7578 Pending deletion blocks: 0 Erasure Coded Block Groups: Low redundancy block groups: 0 Block groups with corrupt internal blocks: 0 Missing block groups: 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.8.173:9866 (datanode1.example.org) Hostname: datanode1.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 435533613 (415.36 MB) Non DFS Used: 2481146067 (2.31 GB) DFS Remaining: 16996384768 (15.83 GB) DFS Used%: 2.07% DFS Remaining%: 80.92% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:19:16 UTC 2021 Last Block Report: Sun Jun 06 22:18:49 UTC 2021 Num of Blocks: 3299 Name: 192.168.8.174:9866 (datanode2.example.org) Hostname: datanode2.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 1431298048 (1.33 GB) Non DFS Used: 2471194624 (2.30 GB) DFS Remaining: 16010571776 (14.91 GB) DFS Used%: 6.81% DFS Remaining%: 76.23% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:19:16 UTC 2021 Last Block Report: Sun Jun 06 22:18:49 UTC 2021 Num of Blocks: 10879
Append a data node that needs to be whitelisted to the above-mentioned file (one per line) on a namenode.
$ echo datanode3.example.org | sudo -u hadoop tee -a /opt/hadoop/hadoop-3.2.2/etc/hadoop/hosts.include
Re-read the include file on a namenode and ensure that service is running on the whitelisted data node.
$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
Display HDFS report.
$ hdfs dfsadmin -report
Configured Capacity: 63010750464 (58.68 GB) Present Capacity: 52258835796 (48.67 GB) DFS Remaining: 49377067008 (45.99 GB) DFS Used: 2881768788 (2.68 GB) DFS Used%: 5.51% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Low redundancy block groups: 0 Block groups with corrupt internal blocks: 0 Missing block groups: 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (3): Name: 192.168.8.173:9866 (datanode1.example.org) Hostname: datanode1.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 449625849 (428.80 MB) Non DFS Used: 2480767239 (2.31 GB) DFS Remaining: 16982671360 (15.82 GB) DFS Used%: 2.14% DFS Remaining%: 80.86% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:23:16 UTC 2021 Last Block Report: Sun Jun 06 22:18:49 UTC 2021 Num of Blocks: 3455 Name: 192.168.8.174:9866 (datanode2.example.org) Hostname: datanode2.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 1418534335 (1.32 GB) Non DFS Used: 2470548033 (2.30 GB) DFS Remaining: 16023982080 (14.92 GB) DFS Used%: 6.75% DFS Remaining%: 76.29% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:23:16 UTC 2021 Last Block Report: Sun Jun 06 22:18:49 UTC 2021 Num of Blocks: 10721 Name: 192.168.8.175:9866 (datanode3.example.org) Hostname: datanode3.example.org Decommission Status : Normal Configured Capacity: 21003583488 (19.56 GB) DFS Used: 1013608604 (966.65 MB) Non DFS Used: 2529042276 (2.36 GB) DFS Remaining: 16370413568 (15.25 GB) DFS Used%: 4.83% DFS Remaining%: 77.94% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Jun 06 22:23:15 UTC 2021 Last Block Report: Sun Jun 06 22:23:09 UTC 2021 Num of Blocks: 7582
You can exclude hosts using dfs.hosts.exclude
option which will have higher priority over this whitelist.