Raspberry Pi 3 Cookbook for Python Programmers
上QQ阅读APP看书,第一时间看更新

How to do it...

  1. Include the following lines in a new Python file to add datasets:
from sklearn.datasets import fetch_20newsgroups 
category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes', 
        'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography', 
        'sci.space': 'OuterSpace'} 
 
training_content = fetch_20newsgroups(subset='train', 
categories=category_mapping.keys(), shuffle=True, random_state=7) 
  1. Perform feature extraction to extract the main words from the text:
from sklearn.feature_extraction.text import CountVectorizer 
 
vectorizing = CountVectorizer() 
train_counts = vectorizing.fit_transform(training_content.data) 
print "nDimensions of training data:", train_counts.shape 
  1. Train the classifier:
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer 
 
input_content = [ 
    "The curveballs of right handed pitchers tend to curve to the left", 
    "Caesar cipher is an ancient form of encryption", 
    "This two-wheeler is really good on slippery roads" 
] 
 
tfidf_transformer = TfidfTransformer() 
train_tfidf = tfidf_transformer.fit_transform(train_counts) 
  1. Implement the Multinomial Naive Bayes classifier:
classifier = MultinomialNB().fit(train_tfidf, training_content.target) 
input_counts = vectorizing.transform(input_content) 
input_tfidf = tfidf_transformer.transform(input_counts) 
  1. Predict the output categories:
categories_prediction = classifier.predict(input_tfidf) 
  1. Print the output:
for sentence, category in zip(input_content, categories_prediction): 
    print 'nInput:', sentence, 'nPredicted category:',  
            category_mapping[training_content.target_names[category]] 

The following screenshot provides examples of predicting the object based on the input from the database: