Replacing negations with antonyms
The opposite of synonym replacement is antonym replacement. An antonym is a word that has the opposite meaning of another word. This time, instead of creating custom word mappings, we can use WordNet to replace words with unambiguous antonyms. Refer to the Looking up lemmas and synonyms in WordNet recipe in Chapter 1, Tokenizing Text and WordNet Basics, for more details on antonym lookups.
How to do it...
Let's say you have a sentence like let's not uglify our code
. With antonym replacement, you can replace not uglify
with beautify
, resulting in the sentence let's beautify our code
. To do this, we will create an AntonymReplacer
class in replacers.py
as follows:
from nltk.corpus import wordnet class AntonymReplacer(object): def replace(self, word, pos=None): antonyms = set() for syn in wordnet.synsets(word, pos=pos): for lemma in syn.lemmas(): for antonym in lemma.antonyms(): antonyms.add(antonym.name()) if len(antonyms) == 1: return antonyms.pop() else: return None def replace_negations(self, sent): i, l = 0, len(sent) words = [] while i < l: word = sent[i] if word == 'not' and i+1 < l: ant = self.replace(sent[i+1]) if ant: words.append(ant) i += 2 continue words.append(word) i += 1 return words
Now, we can tokenize the original sentence into ["let's", 'not', 'uglify', 'our', 'code']
and pass this to the replace_negations()
function. Here are some examples:
>>> from replacers import AntonymReplacer >>> replacer = AntonymReplacer() >>> replacer.replace('good') >>> replacer.replace('uglify') 'beautify' >>> sent = ["let's", 'not', 'uglify', 'our', 'code'] >>> replacer.replace_negations(sent) ["let's", 'beautify', 'our', 'code']
How it works...
The AntonymReplacer
class has two methods: replace()
and replace_negations()
. The replace()
method takes a single word and an optional part-of-speech tag, then looks up the Synsets for the word in WordNet. Going through all the Synsets and every lemma of each Synset, it creates a set of all antonyms found. If only one antonym is found, then it is an unambiguous replacement. If there is more than one antonym, which can happen quite often, then we don't know for sure which antonym is correct. In the case of multiple antonyms (or no antonyms), replace()
returns None
as it cannot make a decision.
In replace_negations()
, we look through a tokenized sentence for the word not
. If not
is found, then we try to find an antonym for the next word using replace()
. If we find an antonym, then it is appended to the list of words, replacing not
and the original word. All other words are appended as is, resulting in a tokenized sentence with unambiguous negations replaced by their antonyms.
There's more...
As unambiguous antonyms aren't very common in WordNet, you might want to create a custom antonym mapping in the same way we did for synonyms. This AntonymWordReplacer
can be constructed by inheriting from both WordReplacer
and AntonymReplacer
:
class AntonymWordReplacer(WordReplacer, AntonymReplacer): pass
The order of inheritance is very important, as we want the initialization and replace function of WordReplacer
combined with the replace_negations
function from AntonymReplacer
. The result is a replacer that can perform the following:
>>> from replacers import AntonymWordReplacer >>> replacer = AntonymWordReplacer({'evil': 'good'}) >>> replacer.replace_negations(['good', 'is', 'not', 'evil']) ['good', 'is', 'good']
Of course, you can also inherit from CsvWordReplacer
or YamlWordReplacer
instead of WordReplacer
if you want to load the antonym word mappings from a file.
See also
The previous recipe covers the WordReplacer
from the perspective of synonym replacement. In Chapter 1, Tokenizing Text and WordNet Basics, WordNet usage is covered in detail in the Looking up Synsets for a word in WordNet and Looking up lemmas and synonyms in WordNet recipes.