OpenNLP is a machine learning toolkit used for processing
NLP. This article focuses on setting up a simple maven project and runs a
simple program using OpenNLP:
Add the following in the maven configuration:
<!-- Open NLP
-->
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.5.3</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
Write a java program for getting sentences:
Following are the high level steps:
- Get a reference of the model en-sent.bin using the
InputStream
- Create Sentence model and SentenceDetector for the input
model stream.
- Get the sentence array using open nlp api
public class
SentenceDetectorClient {
public static void main(String[] args)
{
new
SentenceDetectorClient().go();
}
private void go() {
try {
InputStream
modelIn = new FileInputStream("src/main/resources/models/en-sent.bin"); // --- Import en-sent.bin feil for sentence
mich
SentenceModel
sModel = new SentenceModel(modelIn);
SentenceDetectorME
sentenceDetector = new SentenceDetectorME(sModel); // ------ Creating a Sentence
detector based on the input stream
String
articleText = "Chris Gayle on Monday sounded out a warning to the rival
teams ahead of the World Twenty20 by declaring that he can score a hundred
irrespective of the conditions. “I am capable of scoring a century in any
condition and on any wicket in the world. I just want to give the team that
kind of a start. It will be nice to get another hundred,” Gayle said. “However
it also depends on the conditions as well and how the wicket is playing,” he
said. Asked about the tremendous pressure on him to perform every time, when he
goes out to bat, the Jamaican dasher said it indeed was a challenge to live up
to the expectations. “It creates a lot of pressure as expectations are rising.
When you actually set a trend, then people expect you to come good at all
times. You have fans worldwide who want me to do well. That’s what they pay for
and want to see. But it’s not going to happen all the time but when I do get a
chance I try to entertain people as much as possible,” he said. “We are here to
retain the title and that’s not going to be easy but we are ready for it and we
are ready for the challenges. Our first priority is to make it to the last
four, it’s a tough group. Everybody is looking to win the tournament.”";
String[] sentences =
sentenceDetector.sentDetect(articleText); // -----D
int index = 0;
for (int i = 0; i <
sentences.length; i++) {
index++;
String
sentence = sentences[i];
System.out.println("Sentence :
" +
index + " " + sentence); // ---
printing seach sentence.
}
}
catch (Exception e) {
System.out.println("Exception :
" +
e);
}
}
}
Output:
Sentence : 1 Chris Gayle on Monday
sounded out a warning to the rival teams ahead of the World Twenty20 by
declaring that he can score a hundred irrespective of the conditions.
Sentence : 2 “I am capable of scoring
a century in any condition and on any wicket in the world.
Sentence : 3 I just want to give the
team that kind of a start.
Sentence : 4 It will be nice to get
another hundred,” Gayle said.
Sentence : 5 “However it also depends
on the conditions as well and how the wicket is playing,” he said.
Sentence : 6 Asked about the
tremendous pressure on him to perform every time, when he goes out to bat, the
Jamaican dasher said it indeed was a challenge to live up to the expectations.
Sentence : 7 “It creates a lot of
pressure as expectations are rising.
Sentence : 8 When you actually set a
trend, then people expect you to come good at all times.
Sentence : 9 You have fans worldwide
who want me to do well.
Sentence : 10 That’s what they pay for
and want to see.
Sentence : 11 But it’s not going to
happen all the time but when I do get a chance I try to entertain people as
much as possible,” he said.
Sentence : 12 “We are here to retain
the title and that’s not going to be easy but we are ready for it and we are
ready for the challenges.
Sentence : 13 Our first priority is to
make it to the last four, it’s a tough group.
Sentence : 14 Everybody is looking to
win the tournament.”
-----------------------------------------------------------------------------
Similarly the following code, tokenizes the words from the same article:
InputStream modelIn = new FileInputStream(
"src/main/resources/models/
en-token.bin");
TokenizerModel tModel = new
TokenizerModel(modelIn);
TokenizerME tokenizer = new
TokenizerME(tModel);
String articleText = "Chris Gayle on Monday sounded out a warning to the rival teams ahead of the World Twenty20 by declaring that he can score a hundred irrespective of the conditions. “I am capable of scoring a century in any condition and on any wicket in the world. I just want to give the team that kind of a start. It will be nice to get another hundred,” Gayle said. “However it also depends on the conditions as well and how the wicket is playing,” he said. Asked about the tremendous pressure on him to perform every time, when he goes out to bat, the Jamaican dasher said it indeed was a challenge to live up to the expectations. “It creates a lot of pressure as expectations are rising. When you actually set a trend, then people expect you to come good at all times. You have fans worldwide who want me to do well. That’s what they pay for and want to see. But it’s not going to happen all the time but when I do get a chance I try to entertain people as much as possible,” he said. “We are here to retain the title and that’s not going to be easy but we are ready for it and we are ready for the challenges. Our first priority is to make it to the last four, it’s a tough group. Everybody is looking to win the tournament.”";
String[] tokens =
tokenizer.tokenize(articleText);
int index = 0;
String tokenString = "";
for (int i = 0; i < tokens.length; i++) {
index++;
tokenString = tokenString + tokens[i] + "|";
}
System.out.println("No. of tokens : " + tokenString.length());
System.out.println(tokenString);
Output:
No. of tokens : 1244
Chris|Gayle|on|Monday|sounded|out|a|warning|to|the|rival|teams|ahead|of|the|World|Twenty20|by|declaring|that|he|can|score|a|hundred|irrespective|of|the|conditions|.|“|I|am|capable|of|scoring|a|century|in|any|condition|and|on|any|wicket|in|the|world|.|I|just|want|to|give|the|team|that|kind|of|a|start|.|It|will|be|nice|to|get|another|hundred|,|”|Gayle|said|.|“However|it|also|depends|on|the|conditions|as|well|and|how|the|wicket|is|playing|,|”|he|said|.|Asked|about|the|tremendous|pressure|on|him|to|perform|every|time|,|when|he|goes|out|to|bat|,|the|Jamaican|dasher|said|it|indeed|was|a|challenge|to|live|up|to|the|expectations|.|“It|creates|a|lot|of|pressure|as|expectations|are|rising|.|When|you|actually|set|a|trend|,|then|people|expect|you|to|come|good|at|all|times|.|You|have|fans|worldwide|who|want|me|to|do|well|.|That’s|what|they|pay|for|and|want|to|see|.|But|it|’s|not|going|to|happen|all|the|time|but|when|I|do|get|a|chance|I|try|to|entertain|people|as|much|as|possible|,|”|he|said|.|“We|are|here|to|retain|the|title|and|that|’s|not|going|to|be|easy|but|we|are|ready|for|it|and|we|are|ready|for|the|challenges|.|Our|first|priority|is|to|make|it|to|the|last|four|,|it|’s|a|tough|group|.|Everybody|is|looking|to|win|the|tournament|.|”|