
上QQ阅读APP看书,第一时间看更新
Vectorizing features
The following function will take our training data of 25,000 lists of integers, where each list is a review. In return, it spits out one-hot encoded vectors for each of the integer lists it received from our training set. Then, we simply redefine our training and test features by using this function to transform our integer lists into a 2D tensor of one-hot encoded review vectors:
import numpy as np
def vectorize_features(features):
#Define the number of total words in our corpus
#make an empty 2D tensor of shape (25000, 12000)
dimension=12000
review_vectors=np.zeros((len(features), dimension))
#interate over each review
#set the indices of our empty tensor to 1s
for location, feature in enumerate(features):
review_vectors[location, feature]=1
return review_vectors
x_train = vectorize_features(x_train)
x_test = vectorize_features(x_test)
You can see the result of our transformations by checking the type and shape of our training features and labels. You can also check what one individual vector looks like, as shown in the following code. We can see that each of our reviews is now a vector of length 12000:
type(x_train),x_train.shape, y_train.shape
(numpy.ndarray, (25000, 12000), (25000,))
x_train[0].shape, x_train[0]
((12000,), array([0., 1., 1., ..., 0., 0., 0.]), 12000)