Friday, February 28, 2014

NER using Stanford NLP

Named Entity Recognition - NER helps in identifying meaningful information from the textual content. 
There are different ways to get NER (place, name, organization) out of text, the below example uses Stanford NLP library to obtain NER.

Ensure you have the required Stanford NLP jars for running the below example, if you are using maven, then the following dependencies can be used. 

<!-- Stanford NLP -->
              <dependency>
                     <groupId>edu.stanford.nlp</groupId>
                     <artifactId>stanford-corenlp</artifactId>
                     <version>3.2.0</version>
              </dependency>
              <dependency>
                     <groupId>edu.stanford.nlp</groupId>
                     <artifactId>stanford-corenlp</artifactId>
                     <version>3.2.0</version>
                     <classifier>models</classifier>
              </dependency>
              <dependency>
                     <groupId>com.io7m.xom</groupId>
                     <artifactId>xom</artifactId>
                     <version>1.2.10</version>
              </dependency>
              <dependency>
                     <groupId>joda-time</groupId>
                     <artifactId>joda-time</artifactId>
                     <version>2.1</version>
              </dependency>
              <dependency>
                     <groupId>de.jollyday</groupId>
                     <artifactId>jollyday</artifactId>
                     <version>0.4.7</version>
              </dependency>
              <dependency>
                     <groupId>com.googlecode.efficient-java-matrix-library</groupId>
                     <artifactId>ejml</artifactId>
                     <version>0.23</version>
              </dependency>

High level steps include the following: 

1. Create StanfordCoreNLP object. 
2. Mention the models that might be used for the program
3. Get the annotation article by passing the text. 
4. Get the sentences of the articles. 
5. Get the words from the sentences. 
6. For each of the word from the sentences, 
    obtain NER using the NLP api: 

Code: 

public class NERClient {
      
       static String ARTICLE = "A day after resigning as Navy Chief in New Delhi, Admiral D.K. Joshi on Thursday wrote to his colleagues, saying he was “firm” on taking responsibility for the mishaps that have taken place. ";
       StanfordCoreNLP pipeline = null;
       public static void main(String args[]) {
              NERClient sc = new NERClient();
              sc.go();
       }

       private void getSentences() {
       }

       private void go() {
              Properties props = new Properties();
           props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
           pipeline = new StanfordCoreNLP(props);
          
              Annotation annotation = new Annotation(ARTICLE);
              pipeline.annotate(annotation);
              List<CoreMap> sentences = annotation.get(SentencesAnnotation.class);
             
              for (CoreMap coreMap : sentences) {
                     List<CoreLabel> coreLabels = coreMap.get(TokensAnnotation.class);
                     System.out.println(coreLabels.toString());
                      for (CoreLabel token: coreLabels) {
                            String word = token.get(TextAnnotation.class);
                            String ner = token.get(NamedEntityTagAnnotation.class);
                            String pos = token.get(PartOfSpeechAnnotation.class);
                            System.out.print(word + "(" + ner + ")" + "  ");
                            //System.out.println("pos :" + pos);
                      }
              }

       }

}

Output:
[A, day, after, resigning, as, Navy, Chief, in, New, Delhi, ,, Admiral, D.K., Joshi, on, Thursday, wrote, to, his, colleagues, ,, saying, he, was, ``, firm, '', on, taking, responsibility, for, the, mishaps, that, have, taken, place, .]

A(DURATION)  day(DURATION)  after(O)  resigning(O)  as(O)  Navy(ORGANIZATION)  Chief(O)  in(O)  New(LOCATION)  Delhi(LOCATION)  ,(O)  Admiral(O)  D.K.(PERSON)  Joshi(PERSON)  on(O)  Thursday(DATE)  wrote(O)  to(O)  his(O)  colleagues(O)  ,(O)  saying(O)  he(O)  was(O)  ``(O)  firm(O)  ''(O)  on(O)  taking(O)  responsibility(O)  for(O)  the(O)  mishaps(O)  that(O)  have(O)  taken(O)  place(O)  .(O)  

Hope this helps someone

No comments:

Post a Comment