Named Entity Recognition - NER helps in identifying meaningful
information from the textual content.
There are different ways to get NER
(place, name, organization) out of text, the below example uses Stanford NLP
library to obtain NER.
Ensure you have the required Stanford NLP jars for running
the below example, if you are using maven, then the following dependencies can
be used.
<!-- Stanford NLP -->
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.2.0</version>
<classifier>models</classifier>
</dependency>
<dependency>
<groupId>com.io7m.xom</groupId>
<artifactId>xom</artifactId>
<version>1.2.10</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.1</version>
</dependency>
<dependency>
<groupId>de.jollyday</groupId>
<artifactId>jollyday</artifactId>
<version>0.4.7</version>
</dependency>
<dependency>
<groupId>com.googlecode.efficient-java-matrix-library</groupId>
<artifactId>ejml</artifactId>
<version>0.23</version>
</dependency>
High level steps include the following:
1. Create StanfordCoreNLP object.
2. Mention the models that might be used for the program
3. Get the annotation article by passing the text.
4. Get the sentences of the articles.
5. Get the words from the sentences.
6. For each of the word from the sentences,
obtain NER using the NLP api:
Code:
public class NERClient {
static String ARTICLE = "A day after
resigning as Navy Chief in New Delhi, Admiral D.K. Joshi on Thursday wrote to
his colleagues, saying he was “firm” on taking responsibility for the mishaps
that have taken place. ";
StanfordCoreNLP
pipeline = null;
public static void main(String args[])
{
NERClient
sc = new NERClient();
sc.go();
}
private void getSentences()
{
}
private void go() {
Properties
props = new Properties();
props.put("annotators", "tokenize,
ssplit, pos, lemma, ner, parse");
pipeline = new StanfordCoreNLP(props);
Annotation
annotation = new Annotation(ARTICLE);
pipeline.annotate(annotation);
List<CoreMap>
sentences = annotation.get(SentencesAnnotation.class);
for (CoreMap coreMap :
sentences) {
List<CoreLabel>
coreLabels = coreMap.get(TokensAnnotation.class);
System.out.println(coreLabels.toString());
for (CoreLabel token: coreLabels) {
String word = token.get(TextAnnotation.class);
String ner = token.get(NamedEntityTagAnnotation.class);
String pos =
token.get(PartOfSpeechAnnotation.class);
System.out.print(word + "(" + ner + ")" + " ");
//System.out.println("pos
:" + pos);
}
}
}
}
Output:
[A, day, after, resigning, as, Navy,
Chief, in, New, Delhi, ,, Admiral, D.K., Joshi, on, Thursday, wrote, to, his,
colleagues, ,, saying, he, was, ``, firm, '', on, taking, responsibility, for,
the, mishaps, that, have, taken, place, .]
A(DURATION) day(DURATION) after(O)
resigning(O) as(O) Navy(ORGANIZATION) Chief(O)
in(O) New(LOCATION) Delhi(LOCATION) ,(O)
Admiral(O) D.K.(PERSON)
Joshi(PERSON) on(O) Thursday(DATE) wrote(O)
to(O) his(O) colleagues(O)
,(O) saying(O) he(O)
was(O) ``(O) firm(O)
''(O) on(O) taking(O)
responsibility(O) for(O) the(O)
mishaps(O) that(O) have(O)
taken(O) place(O) .(O)
Hope this helps someone