#NuGet package for the #linguistic analysis API

Hi,

As you might have seen, the Linguistic Analysis API of the Azure Cognitive Services is available as part of the language category. It allows you to perform POS-tagging, which is basically a way to identify each word and its role within a piece of text.

I find POS-Tagging particularly useful whenever you want to capture the essence of a phrase. I’ve been using it a few times to simplify user search queries and build dynamic queries programmatically. So, whatever usage you want to make out of POS-tagging, the current implementation of Microsoft has a little shortcoming: they never answer with both tokens & tags regrouped. To give you a concrete example, here is a screenshot of all possible results (at the time of writing):

linguisticapi

So, as you can see tokens & tags must be parsed separately and “manually” regrouped to identify the original word and its corresponding tag. The result of the analyzer with ID 22a…(in the middle) contains both combined but it’s not the most obvious way of parsing results. Therefore, in order to help achieving this, I created a NuGet package that non only facilitates tokens/tags extraction but also returns the raw results from the Azure API should the format of its answers change over time (it will most likely change since it is still a preview API).

On top of getting a single list of tokens & tags, the library comes with handy functions such as GetVerbs(), GetNouns() and GetAdjectives(). A single AnalyzeText method is provided to perform three kind of analysis:

  • Raw analyzis almost as if you’d call the API yourself
LinguisticsClient cli = new LinguisticsClient(
    "your key");
AnalyzeTextResult res = await cli.AnalyzeTextAsync(new AnalyzeTextRequest(
    "I live in Brussels and you?", new Guid[] {
    new Guid("4fa79af1-f22c-408d-98bb-b7d7aeef7f04"),
    new Guid("22a6b758-420f-4745-8a3c-46835a67c0d2"),
    new Guid("08ea174b-bfdb-4e64-987e-602f85da7f72")})) as AnalyzeTextResult;
  • An extraction of tokens & tags at the same time
AnalyzeTagResult res = await cli.AnalyzeTextAsync(
    new TagTextRequest("I live in Brussels and you?")) as AnalyzeTagResult;
foreach(var TokenTag in res.TokensAndTags)
{
    Console.WriteLine("{0}:{1}", TokenTag.token, TokenTag.tag);
}

resulting in this:
linguisticapi2

and a simple tokenization returning the raw array of token arrays.  So feel free to try it out!

Feel free to watch my videos on Azure Cognitive Services

Happy NLP!

Advertisements

About Stephane Eyskens

Office 365, Azure PaaS and SharePoint platform expert
This entry was posted in Azure, Azure Cognitive Services and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s