Python 3 Text Processing with NLTK 3 Cookbook
上QQ阅读APP看书,第一时间看更新

Replacing negations with antonyms

The opposite of synonym replacement is antonym replacement. An antonym is a word that has the opposite meaning of another word. This time, instead of creating custom word mappings, we can use WordNet to replace words with unambiguous antonyms. Refer to the Looking up lemmas and synonyms in WordNet recipe in Chapter 1, Tokenizing Text and WordNet Basics, for more details on antonym lookups.

How to do it...

Let's say you have a sentence like let's not uglify our code. With antonym replacement, you can replace not uglify with beautify, resulting in the sentence let's beautify our code. To do this, we will create an AntonymReplacer class in replacers.py as follows:

from nltk.corpus import wordnet

class AntonymReplacer(object):
  def replace(self, word, pos=None):
    antonyms = set()
    for syn in wordnet.synsets(word, pos=pos):
      for lemma in syn.lemmas():
        for antonym in lemma.antonyms():
          antonyms.add(antonym.name())
    if len(antonyms) == 1:
      return antonyms.pop()
    else:
      return None

  def replace_negations(self, sent):
    i, l = 0, len(sent)
    words = []
    while i < l:
      word = sent[i]
      if word == 'not' and i+1 < l:
        ant = self.replace(sent[i+1])
        if ant:
          words.append(ant)
          i += 2
          continue
      words.append(word)
      i += 1
    return words

Now, we can tokenize the original sentence into ["let's", 'not', 'uglify', 'our', 'code'] and pass this to the replace_negations() function. Here are some examples:

>>> from replacers import AntonymReplacer
>>> replacer = AntonymReplacer()
>>> replacer.replace('good')
>>> replacer.replace('uglify')
'beautify'
>>> sent = ["let's", 'not', 'uglify', 'our', 'code']
>>> replacer.replace_negations(sent)
["let's", 'beautify', 'our', 'code']

How it works...

The AntonymReplacer class has two methods: replace() and replace_negations(). The replace() method takes a single word and an optional part-of-speech tag, then looks up the Synsets for the word in WordNet. Going through all the Synsets and every lemma of each Synset, it creates a set of all antonyms found. If only one antonym is found, then it is an unambiguous replacement. If there is more than one antonym, which can happen quite often, then we don't know for sure which antonym is correct. In the case of multiple antonyms (or no antonyms), replace() returns None as it cannot make a decision.

In replace_negations(), we look through a tokenized sentence for the word not. If not is found, then we try to find an antonym for the next word using replace(). If we find an antonym, then it is appended to the list of words, replacing not and the original word. All other words are appended as is, resulting in a tokenized sentence with unambiguous negations replaced by their antonyms.

There's more...

As unambiguous antonyms aren't very common in WordNet, you might want to create a custom antonym mapping in the same way we did for synonyms. This AntonymWordReplacer can be constructed by inheriting from both WordReplacer and AntonymReplacer:

class AntonymWordReplacer(WordReplacer, AntonymReplacer):
  pass

The order of inheritance is very important, as we want the initialization and replace function of WordReplacer combined with the replace_negations function from AntonymReplacer. The result is a replacer that can perform the following:

>>> from replacers import AntonymWordReplacer
>>> replacer = AntonymWordReplacer({'evil': 'good'})
>>> replacer.replace_negations(['good', 'is', 'not', 'evil'])
['good', 'is', 'good']

Of course, you can also inherit from CsvWordReplacer or YamlWordReplacer instead of WordReplacer if you want to load the antonym word mappings from a file.

See also

The previous recipe covers the WordReplacer from the perspective of synonym replacement. In Chapter 1, Tokenizing Text and WordNet Basics, WordNet usage is covered in detail in the Looking up Synsets for a word in WordNet and Looking up lemmas and synonyms in WordNet recipes.