上QQ阅读APP看书,第一时间看更新
How to do it...
- Include the following lines in a new Python file to add datasets:
from sklearn.datasets import fetch_20newsgroups category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes', 'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography', 'sci.space': 'OuterSpace'} training_content = fetch_20newsgroups(subset='train', categories=category_mapping.keys(), shuffle=True, random_state=7)
- Perform feature extraction to extract the main words from the text:
from sklearn.feature_extraction.text import CountVectorizer vectorizing = CountVectorizer() train_counts = vectorizing.fit_transform(training_content.data) print "nDimensions of training data:", train_counts.shape
- Train the classifier:
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import TfidfTransformer input_content = [ "The curveballs of right handed pitchers tend to curve to the left", "Caesar cipher is an ancient form of encryption", "This two-wheeler is really good on slippery roads" ] tfidf_transformer = TfidfTransformer() train_tfidf = tfidf_transformer.fit_transform(train_counts)
- Implement the Multinomial Naive Bayes classifier:
classifier = MultinomialNB().fit(train_tfidf, training_content.target) input_counts = vectorizing.transform(input_content) input_tfidf = tfidf_transformer.transform(input_counts)
- Predict the output categories:
categories_prediction = classifier.predict(input_tfidf)
- Print the output:
for sentence, category in zip(input_content, categories_prediction): print 'nInput:', sentence, 'nPredicted category:', category_mapping[training_content.target_names[category]]
The following screenshot provides examples of predicting the object based on the input from the database: