Configuring HDFS replication_Hadoop 2.x Administration Cookbook-QQ阅读男频玄幻网

上QQ阅读APP看书，第一时间看更新

Configuring HDFS replication

For redundancy, it is important to have multiple copies of data. In HDFS, this is achieved by placing copies of blocks on different nodes. By default, the replication factor is 3, which means that for each block written to HDFS, there will be three copies in total on the nodes in the cluster.

It is important to make sure that the cluster is working fine and the user can perform file operations on the cluster.

Getting ready

Log in to any of the nodes in the cluster. It is best to use the edge node, as stated in Chapter 1, and switch to the user hadoop.

Create a simple text file named file1.txt using any of your favorite text editors, and write some content in it.

How to do it...

ssh to the Namenode, which in this case is nn1.cluster1.com, and switch to user hadoop.
Navigate to the /opt/cluster/hadoop/etc/hadoop directory. This is the directory where we installed Hadoop in Chapter 1, Hadoop Architecture and Deployment. If the user has installed it at a different location, then navigate to this directory.
Configure to the dfs.replication parameter in the directory hdfs-site.xml file.
See the following screenshot for this configuration:
Once the changes are made, save the file and make changes across all nodes in the cluster.
Restart the Namenode and Datanode daemons across the cluster. The easiest way of doing this is using the stop-dfs.sh and start-dfs.sh commands.
See the following screenshot, which shows the way to restart the daemons:

How it works...

The dfs.replication parameter is usually the same across the cluster, but it can be configured to be different across all nodes in the cluster. The source node from which the copy operation is done will define the replication factor for a file. For example, if an edge node has replication set to 2, then the blocks will be replicated twice, irrespective of the value on Namenode.