上QQ阅读APP看书，第一时间看更新

Building a deep learning model

Now that we have covered the basics, let's look at building our first true deep learning model! We will use the UHI HAR dataset that we used in Chapter 2, Training a Prediction Model. The following code does some data preparation: it loads the data and selects only the columns that store mean values (those that have the word mean in the column name). The y variables are from 1 to 6; we will subtract one so that the range is 0 to 5. The code for this section is in Chapter4/uci_har.R. It requires the UHI HAR dataset to be in the data folder; download it from https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones and unzip it into the data folder:

train.x <- read.table("../data/UCI HAR Dataset/train/X_train.txt")
train.y <- read.table("../data/UCI HAR Dataset/train/y_train.txt")[[1]]
test.x <- read.table("../data/UCI HAR Dataset/test/X_test.txt")
test.y <- read.table("../data/UCI HAR Dataset/test/y_test.txt")[[1]]
features <- read.table("../data/UCI HAR Dataset/features.txt")
meanSD <- grep("mean\\(\\)|std\\(\\)", features[, 2])
train.y <- train.y-1
test.y <- test.y-1

Next, we will transpose the data and convert it into a matrix. MXNet expects the data to be width x height rather than height x width:

train.x <- t(train.x[,meanSD])
test.x <- t(test.x[,meanSD])
train.x <- data.matrix(train.x)
test.x <- data.matrix(test.x)

The next step is to define the computation graph. We create a placeholder for the data and create two fully connected (or dense) layers followed by relu activations. The first layer has 64 nodes and the second layer has 32 nodes. We create a final fully-connected layer with six nodes – the number of distinct classes in our y variable. We use a softmax activation to convert the numbers from the last six nodes into probabilities for each class:

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=64)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=32)
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=6)
softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")

When you run the previous code, nothing actually executes. To train the model, we create a devices object to indicate where the code should be run, CPU or GPU. Then you pass the symbol for last layer (softmax) into the mx.model.FeedForward.create function. This function has other parameters, which are more properly known as hyper-parameters. These include the epochs (num.round), which control how many times we pass through the data, the learning rate (learning.rate), which controls how much the gradients are updated during each pass, momentum (momentum), which is a hyper-parameter that can help the model to train faster, and the weights initializer (initializer), which controls how the weights and biases for nodes are initially set. We also pass in the evaluation metric (eval.metric),which is how the model is to be evaluated, and a callback function (epoch.end.callback), which is used to output progress information. When we run the function, it trains the model and outputs the progress as per the value we used for the epoch.end.callback parameter, namely every epoch:

devices <- mx.cpu()
mx.set.seed(0)
tic <- proc.time()
model <- mx.model.FeedForward.create(softmax, X = train.x, y = train.y,
                                      ctx = devices,num.round = 20,
                                      learning.rate = 0.08, momentum = 0.9,
                                      eval.metric = mx.metric.accuracy,
                                      initializer = mx.init.uniform(0.01),
                                      epoch.end.callback =
                                        mx.callback.log.train.metric(1))
Start training with 1 devices
[1] Train-accuracy=0.185581140350877
[2] Train-accuracy=0.26104525862069
[3] Train-accuracy=0.555091594827586
[4] Train-accuracy=0.519127155172414
[5] Train-accuracy=0.646551724137931
[6] Train-accuracy=0.733836206896552
[7] Train-accuracy=0.819100215517241
[8] Train-accuracy=0.881869612068966
[9] Train-accuracy=0.892780172413793
[10] Train-accuracy=0.908674568965517
[11] Train-accuracy=0.898572198275862
[12] Train-accuracy=0.896821120689655
[13] Train-accuracy=0.915544181034483
[14] Train-accuracy=0.928879310344828
[15] Train-accuracy=0.926993534482759
[16] Train-accuracy=0.934401939655172
[17] Train-accuracy=0.933728448275862
[18] Train-accuracy=0.934132543103448
[19] Train-accuracy=0.933324353448276
[20] Train-accuracy=0.934132543103448
print(proc.time() - tic)
   user system elapsed 
   7.31 3.03 4.31

Now that we have trained our model, let's see how it does on the test set:


preds1 <- predict(model, test.x)
pred.label <- max.col(t(preds1)) - 1
t <- table(data.frame(cbind(test.y,pred.label)),
            dnn=c("Actual", "Predicted"))
acc<-round(100.0*sum(diag(t))/length(test.y),2)
print(t)
      Predicted
Actual   0   1   2   3   4   5
     0 477  15   4   0   0   0
     1 108 359   4   0   0   0
     2  13  42 365   0   0   0
     3   0   0   0 454  37   0
     4   0   0   0 141 391   0
     5   0   0   0  16   0 521
print(sprintf(" Deep Learning Model accuracy = %1.2f%%",acc))
[1] " Deep Learning Model accuracy = 87.11%"

Not bad! We have achieved an accuracy of 87.11% on our test set.

Wait, where are the backward propagation, derivatives, and so on, that we covered in previous chapters? The answer to that is deep learning libraries largely manage this automatically for you. In MXNet, automatic differentiation is included in a package called the autograd package, which differentiates a graph of operations with the chain rule. It is one less thing to worry about when building deep learning models. For more information, go to https://mxnet.incubator.apache.org/tutorials/gluon/autograd.html.