Distribute data evenly across all disks on a Hadoop datanode.
Inspect diskbalancer
command which moves blocks from one disk to another.
$ sudo -u hadoop -i hdfs diskbalancer -help
usage: hdfs diskbalancer [command] [options] DiskBalancer distributes data evenly between different disks on a datanode. DiskBalancer operates by generating a plan, that tells datanode how to move data between disks. Users can execute a plan by submitting it to the datanode. To get specific help on a particular command please run hdfs diskbalancer -help <command>. --help <arg> valid commands are plan | execute | query | cancel | report
Report
Inspect report
command.
$ sudo -u hadoop -i hdfs diskbalancer -help report
usage: hdfs diskbalancer -fs http://namenode.uri -report [options] Report command reports the volume information of given datanode(s), or prints out the list of nodes that will benefit from running disk balancer. Top defaults to 100 --node <arg> Datanode address, it can be DataNodeID, IP or hostname. --report List nodes that will benefit from running DiskBalancer. --top <arg> specify the number of nodes to be listed which has data imbalance. . E.g.: hdfs diskbalancer -report hdfs diskbalancer -report -top 5 hdfs diskbalancer -report -node <file://> | [<DataNodeID|IP|Hostname>,...]
Report datanodes benefiting from running DiskBalancer
$ sudo -u hadoop -i hdfs diskbalancer -report
2021-07-20 19:10:39,550 INFO command.Command: Processing report command 2021-07-20 19:10:39,569 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://namenode.example.org:9000 will be rate-limited to 20 per second 2021-07-20 19:10:40,972 INFO command.Command: No top limit specified, using default top value 100. 2021-07-20 19:10:40,972 INFO command.Command: Reporting top 3 DataNode(s) benefiting from running DiskBalancer. Processing report command No top limit specified, using default top value 100. Reporting top 3 DataNode(s) benefiting from running DiskBalancer. 1/3 datanode3.example.org[192.168.8.175:9867] - <2e7522d1-3ee2-4a7e-acca-075f4458180e>: 3 volumes with node data density 0.99. 2/3 datanode1.example.org[192.168.8.173:9867] - <608639ec-ee0d-4b5b-9947-ffd62c203d9e>: 1 volumes with node data density 0.00. 3/3 datanode2.example.org[192.168.8.174:9867] - <66c14752-20a9-4889-8855-6e4c18628934>: 1 volumes with node data density 0.00.
Report volume information for specific datanode.
$ sudo -u hadoop -i hdfs diskbalancer -report -node datanode3.example.org
2021-07-20 19:10:54,847 INFO command.Command: Processing report command 2021-07-20 19:10:54,868 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://namenode.example.org:9000 will be rate-limited to 20 per second 2021-07-20 19:10:56,224 INFO command.Command: Reporting volume information for DataNode(s). These DataNode(s) are parsed from 'datanode3.example.org'. Processing report command Reporting volume information for DataNode(s). These DataNode(s) are parsed from 'datanode3.example.org'. datanode3.example.org[192.168.8.175:9867] - <2e7522d1-3ee2-4a7e-acca-075f4458180e>: 3 volumes with node data density 0.98. [DISK: volume-/opt/hadoop/local_data/datanode/] - 0.03 used: 703365141/21003583488, 0.97 free: 20300218347/21003583488, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False. [DISK: volume-/opt/hadoop/local_data/disk2/] - 0.68 used: 7169409024/10501771264, 0.32 free: 3332362240/10501771264, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False. [DISK: volume-/opt/hadoop/local_data/disk3/] - 0.70 used: 7304679424/10501771264, 0.30 free: 3197091840/10501771264, isFailed: False, isReadOnly: False, isSkip: False, isTransient: False.
Plan
Inspect plan
command.
$ sudo -u hadoop -i hdfs diskbalancer -help plan
usage: hdfs diskbalancer -plan <hostname> [options] Creates a plan that describes how much data should be moved between disks. --bandwidth <arg> Maximum disk bandwidth (MB/s) in integer to be consumed by diskBalancer. e.g. 10 MB/s. --maxerror <arg> Describes how many errors can be tolerated while copying between a pair of disks. --out <arg> Local path of file to write output to, if not specified defaults will be used. --plan <arg> Hostname, IP address or UUID of datanode for which a plan is created. --thresholdPercentage <arg> Percentage of data skew that is tolerated before disk balancer starts working. For example, if total data on a 2 disk node is 100 GB then disk balancer calculates the expected value on each disk, which is 50 GB. If the tolerance is 10% then data on a single disk needs to be more than 60 GB (50 GB + 10% tolerance value) for Disk balancer to balance the disks. --v Print out the summary of the plan on console Plan command creates a set of steps that represent a planned data move. A plan file can be executed on a data node, which will balance the data.
Create a plan for a specific datanode.
$ sudo -u hadoop -i hdfs diskbalancer -plan datanode3.example.org
2021-07-20 19:11:27,993 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://namenode.example.org:9000 will be rate-limited to 20 per second 2021-07-20 19:11:29,616 INFO planner.GreedyPlanner: Starting plan for Node : datanode3.example.org:9867 2021-07-20 19:11:29,619 INFO planner.GreedyPlanner: Disk Volume set fc43e668-7855-4135-980b-795b64f25fa0 Type : DISK plan completed. 2021-07-20 19:11:29,619 INFO planner.GreedyPlanner: Compute Plan for Node : datanode3.example.org:9867 took 27 ms 2021-07-20 19:11:29,620 INFO command.Command: Writing plan to: 2021-07-20 19:11:29,620 INFO command.Command: /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.plan.json Writing plan to: /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.plan.json
Inspect plan directory.
$ sudo -u hadoop -i hdfs dfs -ls /system/diskbalancer/2021-Jul-20-19-11-29/
Found 2 items -rw-r--r-- 2 hadoop supergroup 2131 2021-07-20 19:11 /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.before.json -rw-r--r-- 2 hadoop supergroup 1624 2021-07-20 19:11 /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.plan.json
Inspect current volumes.
$ sudo -u hadoop -i hdfs dfs -cat /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.before.json | jq
{ "exclusionList": [], "inclusionList": [], "nodes": [ { "nodeDataDensity": 0, "volumeSets": { "DISK": { "volumes": [ { "path": null, "capacity": 21003583488, "storageType": "DISK", "used": 15622008832, "reserved": 0, "uuid": "DS-66764d91-1c27-4d08-8370-1d03d7fed867", "failed": false, "volumeDataDensity": 0, "skip": false, "transient": false, "readOnly": false } ], "storageType": "DISK", "setID": "f0700f72-d21e-451c-8704-3c4880621afb", "transient": false } }, "dataNodeUUID": "608639ec-ee0d-4b5b-9947-ffd62c203d9e", "dataNodeIP": "192.168.8.173", "dataNodePort": 9867, "dataNodeName": "datanode1.example.org", "volumeCount": 1 }, { "nodeDataDensity": 0, "volumeSets": { "DISK": { "volumes": [ { "path": null, "capacity": 21003583488, "storageType": "DISK", "used": 17054421373, "reserved": 0, "uuid": "DS-4c9ae720-729b-4cd7-b791-6f20dd786011", "failed": false, "volumeDataDensity": 0, "skip": false, "transient": false, "readOnly": false } ], "storageType": "DISK", "setID": "8a9ceee9-4b67-4746-a7f7-8a90f8314a41", "transient": false } }, "dataNodeUUID": "66c14752-20a9-4889-8855-6e4c18628934", "dataNodeIP": "192.168.8.174", "dataNodePort": 9867, "dataNodeName": "datanode2.example.org", "volumeCount": 1 }, { "nodeDataDensity": 0.9834, "volumeSets": { "DISK": { "volumes": [ { "path": null, "capacity": 10501771264, "storageType": "DISK", "used": 7304679424, "reserved": 0, "uuid": "DS-3906f751-623f-4c55-a6e9-cf726fecad75", "failed": false, "volumeDataDensity": -0.3342, "skip": false, "transient": false, "readOnly": false }, { "path": null, "capacity": 10501771264, "storageType": "DISK", "used": 7169409024, "reserved": 0, "uuid": "DS-80c3b8d2-a32e-45e3-b363-16baf5992910", "failed": false, "volumeDataDensity": -0.3213, "skip": false, "transient": false, "readOnly": false }, { "path": null, "capacity": 21003583488, "storageType": "DISK", "used": 703365141, "reserved": 0, "uuid": "DS-ba9f8a9a-3cd9-43ea-9d8d-ee85f0a5fa80", "failed": false, "volumeDataDensity": 0.3279, "skip": false, "transient": false, "readOnly": false } ], "storageType": "DISK", "setID": "561d8227-ab6a-4918-86c1-b72c5b95f5a1", "transient": false } }, "dataNodeUUID": "2e7522d1-3ee2-4a7e-acca-075f4458180e", "dataNodeIP": "192.168.8.175", "dataNodePort": 9867, "dataNodeName": "datanode3.example.org", "volumeCount": 3 } ], "threshold": 0, "output": null }
Inspect created plan.
$ sudo -u hadoop -i hdfs dfs -cat /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.plan.json | jq
{ "volumeSetPlans": [ { "@class": "org.apache.hadoop.hdfs.server.diskbalancer.planner.MoveStep", "sourceVolume": { "path": "/opt/hadoop/local_data/disk3/", "capacity": 10501771264, "storageType": "DISK", "used": 3794289957, "reserved": 0, "uuid": "DS-3906f751-623f-4c55-a6e9-cf726fecad75", "failed": false, "volumeDataDensity": 9.999999999998899e-05, "skip": false, "transient": false, "readOnly": false }, "destinationVolume": { "path": "/opt/hadoop/local_data/datanode/", "capacity": 21003583488, "storageType": "DISK", "used": 7588594714, "reserved": 0, "uuid": "DS-ba9f8a9a-3cd9-43ea-9d8d-ee85f0a5fa80", "failed": false, "volumeDataDensity": 9.999999999998899e-05, "skip": false, "transient": false, "readOnly": false }, "idealStorage": 0.3613, "bytesToMove": 3510389467, "volumeSetID": "fc43e668-7855-4135-980b-795b64f25fa0" }, { "@class": "org.apache.hadoop.hdfs.server.diskbalancer.planner.MoveStep", "sourceVolume": { "path": "/opt/hadoop/local_data/disk2/", "capacity": 10501771264, "storageType": "DISK", "used": 3794568918, "reserved": 0, "uuid": "DS-80c3b8d2-a32e-45e3-b363-16baf5992910", "failed": false, "volumeDataDensity": 0, "skip": false, "transient": false, "readOnly": false }, "destinationVolume": { "path": "/opt/hadoop/local_data/datanode/", "capacity": 21003583488, "storageType": "DISK", "used": 7588594714, "reserved": 0, "uuid": "DS-ba9f8a9a-3cd9-43ea-9d8d-ee85f0a5fa80", "failed": false, "volumeDataDensity": 9.999999999998899e-05, "skip": false, "transient": false, "readOnly": false }, "idealStorage": 0.3613, "bytesToMove": 3374840106, "volumeSetID": "fc43e668-7855-4135-980b-795b64f25fa0" } ], "nodeName": "datanode3.example.org", "nodeUUID": "2e7522d1-3ee2-4a7e-acca-075f4458180e", "port": 9867, "timeStamp": 1626808289619 }
Execute
Inspect execute
command.
$ sudo -u hadoop -i hdfs diskbalancer -help execute
usage: hdfs diskbalancer -execute <planfile> Execute command runs a submits a plan for execution on the given data node. --execute <arg> Takes a plan file and submits it for execution by the datanode. --skipDateCheck skips the date check and force execute the plan Execute command submits the job to data node and returns immediately. The state of job can be monitored via query command.
Execute the plan.
$ sudo -u hadoop -i hdfs diskbalancer -execute /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.plan.json
2021-07-20 19:15:12,781 INFO command.Command: Executing "execute plan" command
Query
Inspect query
command.
$ sudo -u hadoop -i hdfs diskbalancer -help query
usage: hdfs diskbalancer -query <hostname> [options] Query Plan queries a given data node about the current state of disk balancer execution. --query <arg> Queries the disk balancer status of a given datanode. --v Prints details of the plan that is being executed on the node. Query command retrievs the plan ID and the current running state.
Query current plan status.
$ sudo -u hadoop -i hdfs diskbalancer -query datanode3.example.org
2021-07-20 19:15:26,630 INFO command.Command: Executing "query plan" command. Plan File: /system/diskbalancer/2021-Jul-20-19-11-29/datanode3.example.org.plan.json Plan ID: 5ff5adbbb1527e00264dcf6da7846bccf8538ba4 Result: PLAN_DONE
Cancel
Inspect cancel
command.
$ sudo -u hadoop -i hdfs diskbalancer -help cancel
usage: hdfs diskbalancer -cancel <planFile> | -cancel <planID> -node <hostname> Cancel command cancels a running disk balancer operation. --cancel <arg> Cancels a running plan using a plan file. --node <arg> Cancels a running plan using a plan ID and hostName Cancel command can be run via pointing to a plan file, or by reading the plan ID using the query command and then using planID and hostname. Examples of how to run this command are hdfs diskbalancer -cancel <planfile> hdfs diskbalancer -cancel <planID> -node <hostname>
Abort specific plan.
$ sudo -u hadoop -i hdfs diskbalancer -cancel /system/diskbalancer/2021-Jul-20-19-18-25/datanode3.example.org.plan.json
2021-07-20 19:19:14,980 INFO command.Command: Executing "Cancel plan" command.