Create Yarn nodes whitelist.
Display running Yarn nodes.
$ yarn node -list
2021-06-06 22:40:12,371 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032 Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers datanode3.example.org:35637 RUNNING datanode3.example.org:8042 0 datanode1.example.org:36305 RUNNING datanode1.example.org:8042 0 datanode2.example.org:33917 RUNNING datanode2.example.org:8042 0
Create a hosts.yarn.include
file on a resourcemanager node.
$ cat <<EOF | sudo -u hadoop tee /opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn.nodes.include datanode1.example.org datanode2.example.org EOF
Define a yarn.resourcemanager.nodes.include-path
option inside yarn-site.xml
on a resourcemanager node.
$ sudo -u hadoop vim /opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.nodes.include-path</name> <value>/opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn.nodes.include</value> </property> </configuration>
Restart service on a resourcemanager node.
$ sudo systemctl restart hadoop-yarn-resourcemanger.service
Display running Yarn nodes.
$ yarn node -list
2021-06-06 22:49:17,827 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032 Total Nodes:2 Node-Id Node-State Node-Http-Address Number-of-Running-Containers datanode1.example.org:36305 RUNNING datanode1.example.org:8042 0 datanode2.example.org:33917 RUNNING datanode2.example.org:8042 0
Append a Yarn node that needs to be whitelisted to the above-mentioned file on a resourcemanager node.
$ echo datanode3.example.org | sudo -u hadoop tee -a /opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn.nodes.include
Re-read the include file on a resourcemanager node.
$ yarn rmadmin -refreshNodes
2021-06-06 22:52:19,317 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8033
Display running Yarn nodes.
$ yarn node -list
2021-06-06 22:52:49,604 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032 Total Nodes:2 Node-Id Node-State Node-Http-Address Number-of-Running-Containers datanode1.example.org:36305 RUNNING datanode1.example.org:8042 0 datanode2.example.org:33917 RUNNING datanode2.example.org:8042 0
Display all Yarn nodes.
$ yarn node -list -all
2021-06-06 22:53:37,559 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032 Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers datanode3.example.org:46125 RUNNING datanode3.example.org:8042 0 datanode1.example.org:36305 RUNNING datanode1.example.org:8042 0 datanode2.example.org:33917 RUNNING datanode2.example.org:8042 0
Simple as that.