Categories
DevOps

How to rebalance data across HDFS cluster

Rebalance data across HDFS cluster.

Inspect balancer paramters.

$ sudo -u hadoop -i hdfs balancer --help 
Usage: hdfs balancer
        [-policy ]      the balancing policy: datanode or blockpool
        [-threshold ]        Percentage of disk capacity
        [-exclude [-f  | ]]  Excludes the specified datanodes.
        [-include [-f  | ]]  Includes only the specified datanodes.
        [-source [-f  | ]]   Pick only the specified datanodes as source nodes.
        [-blockpools ]   The balancer will only run on blockpools included in this list.
        [-idleiterations ]      Number of consecutive idle iterations (-1 for Infinite) before exit.
        [-runDuringUpgrade]     Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.

Generic options supported are:
-conf         specify an application configuration file
-D                define a value for a given property
-fs  specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt   specify a ResourceManager
-files                 specify a comma-separated list of files to be copied to the map reduce cluster
-libjars                specify a comma-separated list of jar files to be included in the classpath
-archives           specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

Sample execution. Notice, it will perform its work in the foreground.

$ sudo -u hadoop -i hdfs balancer
2021-07-19 22:37:39,403 INFO balancer.Balancer: namenodes  = [hdfs://namenode.example.org:9000]
2021-07-19 22:37:39,405 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
2021-07-19 22:37:39,406 INFO balancer.Balancer: included nodes = []
2021-07-19 22:37:39,406 INFO balancer.Balancer: excluded nodes = []
2021-07-19 22:37:39,406 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved  NameNode
2021-07-19 22:37:39,416 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://namenode.example.org:9000 will be rate-limited to 20 per second
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.namenode.get-blocks.max-qps = 20 (default=20)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2021-07-19 22:37:40,477 INFO balancer.Balancer: dfs.datanode.balance.bandwidthPerSec = 10485760 (default=10485760)
2021-07-19 22:37:40,481 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2021-07-19 22:37:40,481 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
2021-07-19 22:37:40,501 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.8.174:9866
2021-07-19 22:37:40,501 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.8.173:9866
2021-07-19 22:37:40,503 INFO balancer.Balancer: 0 over-utilized: []
2021-07-19 22:37:40,503 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Jul 19, 2021, 10:37:40 PM          0                  0 B                 0 B                0 B  hdfs://namenode.example.org:9000
Jul 19, 2021, 10:37:40 PM Balancing took 1.36 seconds

Notice default parameters. You can alter these during execution.

$ sudo -u hadoop -i hdfs balancer -Ddfs.datanode.balance.bandwidthPerSec=50m
2021-07-19 22:38:06,775 INFO balancer.Balancer: namenodes  = [hdfs://namenode.example.org:9000]
2021-07-19 22:38:06,789 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
2021-07-19 22:38:06,789 INFO balancer.Balancer: included nodes = []
2021-07-19 22:38:06,789 INFO balancer.Balancer: excluded nodes = []
2021-07-19 22:38:06,789 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved  NameNode
2021-07-19 22:38:06,793 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://namenode.example.org:9000 will be rate-limited to 20 per second
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.namenode.get-blocks.max-qps = 20 (default=20)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2021-07-19 22:38:07,959 INFO balancer.Balancer: dfs.datanode.balance.bandwidthPerSec = 52428800 (default=10485760)
2021-07-19 22:38:07,963 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2021-07-19 22:38:07,963 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
2021-07-19 22:38:07,982 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.8.173:9866
2021-07-19 22:38:07,983 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.8.174:9866
2021-07-19 22:38:07,984 INFO balancer.Balancer: 0 over-utilized: []
2021-07-19 22:38:07,984 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Jul 19, 2021, 10:38:07 PM          0                  0 B                 0 B                0 B  hdfs://namenode.example.org:9000
Jul 19, 2021, 10:38:08 PM Balancing took 1.499 seconds

Every iteration of the balancing process will log performed actions.

2021-07-19 23:14:04,329 INFO balancer.Balancer: Need to move 71.25 GB to make the cluster balanced.
2021-07-19 23:14:04,514 INFO balancer.Balancer: Decided to move 10 GB bytes from 192.168.8.173:9866:DISK to 192.168.8.174:9866:DISK
2021-07-19 23:14:04,736 INFO balancer.Balancer: Will move 10 GB in this iteration

You can inspect these parameters before execution.

$ sudo -u hadoop -i hdfs getconf -confKey dfs.datanode.balance.bandwidthPerSec
10m