Python 3 Text Processing with NLTK 3 Cookbook
上QQ阅读APP看书,第一时间看更新

Looking up lemmas and synonyms in WordNet

Building on the previous recipe, we can also look up lemmas in WordNet to find synonyms of a word. A lemma (in linguistics), is the canonical form or morphological form of a word.

How to do it...

In the following code, we'll find that there are two lemmas for the cookbook Synset using the lemmas() method:

>>> from nltk.corpus import wordnet
>>> syn = wordnet.synsets('cookbook')[0]
>>> lemmas = syn.lemmas()
>>> len(lemmas)
2
>>> lemmas[0].name()
'cookbook'
>>> lemmas[1].name()
'cookery_book'
>>> lemmas[0].synset() == lemmas[1].synset()
True

How it works...

As you can see, cookery_book and cookbook are two distinct lemmas in the same Synset. In fact, a lemma can only belong to a single Synset. In this way, a Synset represents a group of lemmas that all have the same meaning, while a lemma represents a distinct word form.

There's more...

Since all the lemmas in a Synset have the same meaning, they can be treated as synonyms. So if you wanted to get all synonyms for a Synset, you could do the following:

>>> [lemma.name() for lemma in syn.lemmas()]
['cookbook', 'cookery_book']

All possible synonyms

As mentioned earlier, many words have multiple Synsets because the word can have different meanings depending on the context. But, let's say you didn't care about the context, and wanted to get all the possible synonyms for a word:

>>> synonyms = []
>>> for syn in wordnet.synsets('book'):
...     for lemma in syn.lemmas():
...         synonyms.append(lemma.name())
>>> len(synonyms)
38

As you can see, there appears to be 38 possible synonyms for the word 'book'. But in fact, some synonyms are verb forms, and many synonyms are just different usages of 'book'. If, instead, we take the set of synonyms, there are fewer unique words, as shown in the following code:

>>> len(set(synonyms))
25

Antonyms

Some lemmas also have antonyms. The word good, for example, has 27 Synsets, five of which have lemmas with antonyms, as shown in the following code:

>>> gn2 = wordnet.synset('good.n.02')
>>> gn2.definition()
'moral excellence or admirableness'
>>> evil = gn2.lemmas()[0].antonyms()[0]
>>> evil.name
'evil'
>>> evil.synset().definition()
'the quality of being morally wrong in principle or practice'
>>> ga1 = wordnet.synset('good.a.01')
>>> ga1.definition()
'having desirable or positive qualities especially those suitable for a thing specified'
>>> bad = ga1.lemmas()[0].antonyms()[0]
>>> bad.name()
'bad'
>>> bad.synset().definition()
'having undesirable or negative qualities'

The antonyms() method returns a list of lemmas. In the first case, as we can see in the previous code, the second Synset for good as a noun is defined as moral excellence, and its first antonym is evil, defined as morally wrong. In the second case, when good is used as an adjective to describe positive qualities, the first antonym is bad, which describes negative qualities.

See also

In the next recipe, we'll learn how to calculate Synset similarity. Then in Chapter 2, Replacing and Correcting Words, we'll revisit lemmas for lemmatization, synonym replacement, and antonym replacement.