Big Data Architect’s Handbook
上QQ阅读APP看书,第一时间看更新

Preparing the data file for analysis

Create a text file in your file system and enter some dummy that can be utilized in our calculation. The format for entering text in the file is as follows:

Utility Name<space>Year-Month<space>Amount.

The following is an example of our text file. Alternatively, you can download a text file from the GitHub repository, by visiting https://github.com/smfahad/BDAH-chapter3-example-data.git:

electricity 2017-01 2000
gas 2017-01 700
telephone 2017-01 1150
electricity 2017-02 2230
gas 2017-02 850
telephone 2017-02 1350
...
...

Log.txt file sample data

Now create the utility directory and copy this file into HDFS by executing the following commands:

$ hdfs dfs -mkdir /utility/ 
$ hdfs dfs -put log.txt /utility/