Categories
SysOps

How to decommission Yarn node

Decommission Yarn node with minimal impact on the running applications.

Display running Yarn nodes.

$ yarn node -list
2021-05-26 21:41:02,009 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032
Total Nodes:3
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
datanode3.example.org:41963             RUNNING datanode3.example.org:8042                                 0
datanode1.example.org:36073             RUNNING datanode1.example.org:8042                                 0
datanode2.example.org:43967             RUNNING datanode2.example.org:8042                                 0

Create a hosts.yarn.exclude file on a resourcemanager node.

$ sudo -u hadoop touch /opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn.nodes.exclude

Define a yarn.resourcemanager.nodes.exclude-path option inside yarn-site.xml on a resourcemanager node.

$ sudo -u hadoop vim /opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
   <name>yarn.resourcemanager.nodes.exclude-path</name>
   <value>/opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn.nodes.exclude</value>
 </property>
</configuration>

Restart service on a resourcemanager node.

$ sudo systemctl restart hadoop-yarn-resourcemanger.service

Append a node that needs to be decommissioned to the above-mentioned file (one per line) on a resourcemanager node.

$ echo datanode3.example.org | sudo -u hadoop tee -a /opt/hadoop/hadoop-3.2.2/etc/hadoop/yarn.nodes.exclude

Re-read the exclude file on a resourcemanager node.

$ yarn rmadmin -refreshNodes
2021-05-26 21:54:52,613 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8033

Display running Yarn nodes.

$ yarn node -list                  
2021-05-26 21:56:25,345 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032
Total Nodes:2
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
datanode1.example.org:36073             RUNNING datanode1.example.org:8042                                 0
datanode2.example.org:43967             RUNNING datanode2.example.org:8042                                 0

Display all Yarn nodes.

$ yarn node -list -all
2021-05-26 21:56:29,658 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.example.org/192.168.8.172:8032
Total Nodes:3
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
datanode1.example.org:36073             RUNNING datanode1.example.org:8042                                 0
datanode2.example.org:43967             RUNNING datanode2.example.org:8042                                 0
datanode3.example.org:41963      DECOMMISSIONED datanode3.example.org:8042                                 0