Named Entity Recognition (NER) model

This SageMaker model uses SpaCy to power an API for extracting named entities from text.

The model can identify the following entities:

  • PERSON People, including fictional.
  • NORP Nationalities or religious or political groups.
  • FAC Buildings, airports, highways, bridges, etc.
  • ORG Companies, agencies, institutions, etc.
  • GPE Countries, cities, states.
  • LOC Non-GPE locations, mountain ranges, bodies of water.
  • PRODUCT Objects, vehicles, foods, etc. (Not services.)
  • EVENT Named hurricanes, battles, wars, sports events, etc.
  • WORK_OF_ART Titles of books, songs, etc.
  • LAW Named documents made into laws.
  • LANGUAGE Any named language.
  • DATE Absolute or relative dates or periods.
  • TIME Times smaller than a day.
  • PERCENT Percentage, including ”%“.
  • MONEY Monetary values, including unit.
  • QUANTITY Measurements, as of weight or distance.
  • ORDINAL “first”, “second”, etc.
  • CARDINAL Numerals that do not fall under another type.

The model currently only supports English.

Once provisioned, you can query the model as a REST api, or using the Sagemaker API.

 

For a complete example, please see the ner colab notebook

Example SageMaker query:

Note that your endpoint name might differ from this example.

aws sagemaker-runtime invoke-endpoint --endpoint-name sigmodata-ner --body '{ "input": ["I went to Paris last week"] }' --content-type "application/json" out.txt

 

This will save the results to out.txt. The results will look like this

{
  "results": [
    [
      {
        "end": 15,
        "label": "GPE",    
        "start": 10,       
        "text": "Paris"    
      },
      {
        "end": 25,
        "label": "DATE",   
        "start": 16,       
        "text": "last week"
      }
    ]
  ]
}