Use cases of the read_csv method
The read_csv
method can be put to a variety of uses. Let us look at some such use cases.
Passing the directory address and filename as variables
Sometimes it is easier and viable to pass the directory address and filename as variables to avoid hard-coding. More importantly so, when one doesn't want to hardcode the full address of the file and intend to use this full address many times. Let us see how we can do so while importing a dataset.
import pandas as pd path = 'E:/Personal/Learning/Datasets/Book' filename = 'titanic3.csv' fullpath = path+'/'+filename data = pd.read_csv(fullpath)
For such cases, alternatively, one can use the following snippet that uses the path.join
method in an os
package:
import pandas as pd import os path = 'E:/Personal/Learning/Datasets/Book' filename = 'titanic3.csv' fullpath = os.path.join(path,filename) data = pd.read_csv(fullpath)
One advantage of using the latter method is that it trims the lagging or leading white spaces, if any, and gives the correct filename.
Reading a .txt dataset with a comma delimiter
Download the Customer Churn Model.txt
dataset from the Google Drive folder and save it on your local drive. To read this dataset, the following code snippet will do:
import pandas as pd data = read_csv('E:/Personal/Learning/Datasets/Book/Customer Churn Model.txt')
As you can see, although it's a text file, it can be read easily using the read_csv
method without even specifying any other argument of the method.
Specifying the column names of a dataset from a list
We just read the Customer Churn Model.txt
file in the last segment with the default column names. But, what if we want to rename some or all of the column names? Or, what if the column names are not there already and we want to assign names to columns from a list (let's say, available in a CSV file).
Look for a CSV file called Customer Churn Columns.csv
in the Google Drive and download it. I have put English alphabets as placeholders for the column names in this file. We shall use this file to create a list of column names to be passed on to the dataset. You can change the names in the CSV files, if you like, and see how they are incorporated as column names.
The following code snippet will give the name of the column names of the dataset we just read:
import pandas as pd data = pd.read_csv('E:/Personal/Learning/Datasets/Book/Customer Churn Model.txt') data.columns.values
If you run it on one of the IDEs, you should get the following screenshot as the output:
Fig. 2.2: The column names in the Customer Churn Model.txt dataset
This basically lists all the column names of the dataset. Let us now go ahead and change the column names to the names we have in the Customer Churn Columns.csv
file.
data_columns = pd.read_csv('E:/Personal/Learning/Predictive Modeling Book/Book Datasets/Customer Churn Columns.csv') data_column_list = data_columns['Column_Names'].tolist() data=pd.read_csv('E:/Personal/Learning/Predictive Modeling Book/Book Datasets/Customer Churn Model.txt',header=None,names=data_column_list) data.columns.values
The output after running this snippet should look like the following screenshot (if you haven't made any changes to the values in the Customer Churn Columns.csv
file):
Fig. 2.3: The column names in the Customer Churn Columnsl.txt dataset which have been passed to the data frame data
The key steps in this process are:
- Sub-setting the particular column (containing the column names) and converting it to a list—done in the second line
- Passing the
header=None
andnames=name of the list
containing thecolumn names(data_column_list in this case)
in theread_csv
method
If some of the terms, such as sub-setting don't make sense now, just remember that it is an act of selecting a combination of particular rows or columns of the dataset. We will discuss this in detail in the next chapter.