
Time for action – changing the base HDFS directory
Let's first set the base directory that specifies the location on the local filesystem under which Hadoop will keep all its data. Carry out the following steps:
- Create a directory into which Hadoop will store its data:
$ mkdir /var/lib/hadoop
- Ensure the directory is writeable by any user:
$ chmod 777 /var/lib/hadoop
- Modify
core-site.xml
once again to add the following property:<property> <name>hadoop.tmp.dir</name> <value>/var/lib/hadoop</value> </property>
What just happened?
As we will be storing data in Hadoop and all the various components are running on our local host, this data will need to be stored on our local filesystem somewhere. Regardless of the mode, Hadoop by default uses the hadoop.tmp.dir
property as the base directory under which all files and data are written.
MapReduce, for example, uses a /mapred
directory under this base directory; HDFS uses /dfs
. The danger is that the default value of hadoop.tmp.dir
is /tmp
and some Linux distributions delete the contents of /tmp
on each reboot. So it's safer to explicitly state where the data is to be held.