Natural Language Processing with Java Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

The necessary steps include the following:

  1. Add the following import statement to the project:
import java.text.BreakIterator;
  1. Next, add the declaration for text as an instance variable:
private static String text = 
"We will start with a simple sentence. However, is it "
+ "possible for a sentence to end with a question "
+ "mark? Obviously that is possible! Another "
+ "complication is the use of a number such as 56.32 "
+ "or ellipses such as ... Ellipses may be found ... "
+ "with a sentence! Of course, we may also find the "
+ "use of abbreviations such as Mr. Smith or "
+ "Dr. Jones.";
  1. Add the following code to the main method to set up the BreakIterator instance:
BreakIterator breakIterator = BreakIterator.getSentenceInstance();
breakIterator.setText(text);
  1. Next, add the code sequence that follows to use the BreakIterator instance to find and display sentences:
int startPosition = breakIterator.first();
int endingPosition = breakIterator.first();
while (true) {
endingPosition = breakIterator.next();
if (endingPosition == BreakIterator.DONE) {
break;
} else {
System.out.println(startPosition + "-" +
endingPosition + " [" +
text.substring(startPosition, endingPosition) + "]");
startPosition = endingPosition;
}
}
  1. Execute the program. You will get the following output:
0-38 [We will start with a simple sentence. ]
38-106 [However, is it possible for a sentence to end with a question mark? ]
106-135 [Obviously that is possible! ]
135-217 [Another complication is the use of a number such as 56.32 or ellipses such as ... ]
217-260 [Ellipses may be found ... with a sentence! ]
260-325 [Of course, we may also find the use of abbreviations such as Mr. ]
325-338 [Smith or Dr. ]
338-344 [Jones.]