Adding new DataNodes to the cluster

Hello everyone, If I add new DataNodes to the cluster will HDFS move the blocks to the newly added nodes in order to balance the disk space utilization between the nodes?

Thanks in advance.

No, HDFS will not move blocks to new nodes automatically. However, newly created files will likely have their blocks placed on the new nodes.
If you want to rebalance the cluster, here’s what you can do:
a. Select a subset of files that take up a good percentage of your disk space; copy them to new locations in HDFS; remove the old copies of the files; rename the new copies to their original names.
b. A simpler way, with no interruption of service, is to turn up the replication of files, wait for transfers to stabilize, and then turn the replication back down.
c. Yet another way to re-balance blocks is to turn off the data-node, which is full, wait until its blocks are replicated, and then bring it back again. The over-replicated blocks will be randomly removed from different nodes, so you really get them rebalanced not just removed from the current node.
d. Finally, you can use the bin/start-balancer.sh command to run a balancing process to move blocks around the cluster automatically.
Hope this helps :slight_smile:

Some times data node may not starts.Then you need to format the file system and delete the temporary files .Then start it again then it will work.

Hi all,
I am new to this forum, happy to join here.
Thanks for your information.