data:image/s3,"s3://crabby-images/2e346/2e34660331d6a16ae073d2fec49f8485967a87bb" alt="Natural Language Processing with Java Cookbook"
上QQ阅读APP看书,第一时间看更新
How to do it...
Let's go through the following steps:
- Open a command-line window. We used the Window's cmd program in this example
- Set up a path for the OpenNLP tool's bin directory and then navigate to the directory containing the en-lemmatizer.dict file.
- Execute the following command:
opennlp LemmatizerTrainerME -model en-lemmatizer.bin -lang en -data en-lemmatizer.dict -encoding UTF-8
You will get the following output. It has been shortened here to save space:
Indexing events with TwoPass using cutoff of 5
Computing event counts... done. 301403 events Indexing... done.
Sorting and merging events... done. Reduced 301403 events to 297777.
Done indexing in 9.09 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 297777
Number of Outcomes: 432
Number of Predicates: 69122
...done.
Computing model parameters ...
Performing 100 iterations.
1: ... loglikelihood=-1829041.6775780176 3.317817009120679E-6
2: ... loglikelihood=-452333.43760414346 0.876829361353404
3: ... loglikelihood=-211099.05280473927 0.9506806501594212
4: ... loglikelihood=-132195.3981804198 0.9667554735686108
...
98: ... loglikelihood=-6702.5821153954375 0.9988420818638168
99: ... loglikelihood=-6652.6134177562335 0.998845399680826
100: ... loglikelihood=-6603.518040975329 0.9988553531318534
Writing lemmatizer model
... done (1.274s)
Wrote lemmatizer model to
path: C:\Downloads\OpenNLP\en-lemmatizer.bin
Execution time: 275.369 seconds