[Update: 09/2017]: I demonstrate this from A to Z in my online course on Channel9 https://channel9.msdn.com/niners/stephaneey , feel free to watch episode 5 demonstrating this using Azure Cognitive Services.
I recently wrote a blog post on SharePoint’s nextgen searchbox that showed how to use a bot to query SharePoint instead of using the regular searchbox. As many of you know, SharePoint’s search engine is very powerful but end users will not leverage 10% of the keyword query syntax. The idea of having a bot building such queries automatically by interpreting end users questions and “translating” them into SP queries is a great way of letting end users express their needs in natural language and let the system figure out what they want.
I’m not going to repeat what I stated in that blog post but I’d like to add another dimension: the Part-of-speech Tagging (POS) which may help in building very accurate queries.
So, the idea is to have a flow similar to this:
So, we use LUIS as understanding layer to detect an intention, in that case, we understand that the user is looking for some documentation/kb information. With the POS-Tagging, we identify the role of each word in the original sentence and we only keep the categories which are important in order to query the source of information (step 5). Why doing that? Simply to remove noisy words such as “how”, “to”, “on”, “a” so as to get “install onedrive smartphone”. That’s a search phrase a search engine likes much more than the previous one. Of course, there is no black or white answer on how to best select the categories to keep as it probably depends on your business case. But combining this plus the entity extraction mechanism of LUIS, you can start building very powerful and accurate queries.
For instance, with the above sentence, you could end up with such a query:
tags:onedrive AND (QnAQuestionOWSMTXT:(install NEAR onedrive NEAR smartphone)) OR (QnAAnswerOWSMTXT:(install NEAR onedrive NEAR smartphone)) XRANK(cb=100) QnAQuestionOWSMTXT:(install NEAR onedrive)
where QnAQuestionOWSMTXT and QnAAnswerOWSMTXT would be SharePoint Managed Properties pointing to questions/answers pairs stored in a list (as an alternative to Microsoft’s QnA for instance), on which Enterprise Keywords was enabled.
Now that you got the point, which system to use to perform POS-Tagging? Again, no black or white answer and it depends on where the actual code is hosted. Microsoft has its Cloud-based API, Stanford has a free tagger (ported to .NET) and NLP-like languages such as R and Python also come with such features.