Building the nextgen SharePoint search through a BOT and LUIS?

Hi,

I’ve recently worked on creating a BOT with the Microsoft Bot framework that handles queries from end users expressed in natural language. The BOT leverages #LUIS, Microsoft’s NLP engine, in order to extract entities and semantics our of the queries. At the time of writing, both LUIS and the Microsoft Bot framework are still in preview but they let us envision great possibiliities.

The SharePoint search engine is very powerful as it is fast and highly tunnable. However, most of the times, users simply use it as they use Google or Bing. They don’t know the name of the managed properties we created nor even the out of the box keyword query elements such as isDocument:1, etc.

So, what about creating a Document Finder BOT that’d be a front-end component to SharePoint (and potentially other systems) and that would generate accurate SharePoint queries, based on the user input expressed in natural language? What about mapping LUIS entities to SharePoint managed properties? To give you a very basic example, the following sentences:

  • I’m looking for the OneDrive FAQ
  • Where is the OneDrive FAQ?
  • I’m looking for a SharePoint manual
  • I want to get the SharePoint manual

could be mapped to a LUIS intent named “FindDocumentation” and “OneDrive”, “SharePoint” would be entities of type “Software” while “FAQ” and “Manual” would be entities of type “TypeOfDocument”.

As you probably already understood, these documents could be uploaded in SharePoint with the corresponding managed properties, where we’d have the SharePoint manual being tagged the following way: Topic:SharePoint DocumentType:Manual.

The remaining of the work would consist in translating the original sentence into a SharePoint search query such as:Topic:SharePoint TypeOfDocument:FAQ isDocument:1.

This auto-generated very specific SharePoint query will return very accurate results. If no results are returned, one could decide to let the BOT perform another query against SharePoint using the original text, so as to return at least something, even less accurate.

Ok, I hope you got the point here, so let’s see how to build such a boiler plate BOT and LUIS model to get started.

Note that I’m not going to explain what is LUIS, what is Azure AD (as you’ll need the user token to gain access to SharePoint) nor what is the BOT framework but I wrote some specific blog posts about user authentication with a BOT. I’ll just enumarate what needs to be done to reach the objective.

In a nutshell, we’ll do the following:

  • Create a LUIS model with a single Intent named “FindDocumentation”
  • Create a few entities into that model
  • Create a site collection in SharePoint Online where we’ll create managed properties that match the entity types we have defined in LUIS
  • Develop a BOT and consume it via the webchat channel

Before we start, here is the overall result that will either make you even more interested, ether make you run away, but at least, you won’t lose time in reading the rest of this post:

luis9

So, as you can see, the user is looking for two different types of documents and the BOT performs the SharePoint search on his behalf and comes back with accurate results.

Creation of the LUIS model

Go to https://www.luis.ai/ and sign in with a Microsoft Account.  Create a new application, give it a name, chose BOT as usage scenario, select one or more categories and terminate by specifying the application culture (English for this example).

Once the application is created, create two new entities, namely, Software and TypeOfDocument. Now, create a new Intent named FindDocumentation. Now that it’s done, you start entering new utterances. In my case, I’ve entered the following ones:

I need to find a SharePoint Manual
Where is the SharePoint FAQ?
Where is the OneDrive FAQ?
I’m looking for OneDrive documentation
How to get the Azure manual?

and here is a screenshot showing how to tell LUIS about the entities:

luis1

By default, the words SharePoint and Manual won’t be highlighted. By clicking on the words, you have to tell LUIS about their respective type of entities. Make sure the right intent is shown in the dropdown and click on submit. Enter the other phrases and do the same job. Of course, the more phrases you specify, the more accurate will LUIS be, but we won’t go too far for this post. Beware that LUIS also provides an Active Learning engine that lets you adjust the model later on once the users have sent queries. There is no need to be exhaustive from the start.

Once you’re done, click on the Train button (bottom right) and then, click on the Publish button at the top. This will reveal the service endpoint for your LUIS model as shown here:

luis2

note that the URL contains the app id and the secret you’ll have to use later on in your BOT.

 SharePoint Online

Take any SharePoint Site collection where you can afford to manipulate the search schema and create term stores (local or global as you wish).

You should end up with a taxonomy similar as this one:

luis3

Now, it’s time to build a content type that is using this taxonomy:

In my example, I end up with the following content type:

luis4

where Main Topics (multi-valued) and Doc Type respectively point to the Topics and Type of document term stores.

Now, we can associate this content type to a document library, upload some documents into it and create our managed properties.

In the end, you should end up with documents that were uploaded and tagged correctly:

luis6

You have to wait for SharePoint to crawl the newly added content, an easy way to test if it’s ok is simply to search directly in the document library. Once you get results, you’re good to create your managed properties and obtain something similar to this:

luis7

To make sure your SharePoint configuration is ok, you should first perform a query using these managed properties within SharePoint directly before trying from the BOT itself. As an example, here is how it looks like in my environment:

luis8

From the above test, I know for sure that I can rely on the kbDocType managed property. Note that if you want to be more generic, you can also use the generic Tags and/or the enterprise keywords.

Developing the BOT

This is probably the hardest part of the work but I’m not gonna explain all the bits and bytes. You should first read the official documentation and then read my own blog posts on how to authenticate the end users (one possible technique).

That said, before developing the BOT, you need to register one. For that, just go to the registration page, follow the instructions and make sure to store the BOT id and password for further use.

To develop the boiler plate of the BOT and being able to authenticate the user, I’d recommend you reading my other blog post and even maybe restart from the GitHub project I created as most of the work is already done there.

Now, I’m going to focus only on the code that is bound to our Bot/Luis/SharePoint talk.

Here is my webchat controller:

[Authorize]
public class WebChatController : ApiController
{
    private string clientId = ConfigurationManager.AppSettings["ida:ClientId"];
    private string appKey = ConfigurationManager.AppSettings["ida:ClientSecret"];
    private string aadInstance = ConfigurationManager.AppSettings["ida:AADInstance"];
    private string tenantId = ConfigurationManager.AppSettings["ida:TenantId"];

    private string SPAccessToken = null;
    public HttpResponseMessage Get()
    {
        GetSPTokenSilent().Wait();
        var userId = ClaimsPrincipal.Current.FindFirst(ClaimTypes.NameIdentifier).Value;

        string WebChatString =
            new WebClient().DownloadString("https://webchat.botframework.com/embed/DocumentFinder?s=<your secret>" +
            HttpUtility.UrlEncode(userId) + "&username=" + HttpUtility.UrlEncode(ClaimsPrincipal.Current.Identity.Name));

        WebChatString = WebChatString.Replace("/css/botchat.css", "https://webchat.botframework.com/css/botchat.css");
        WebChatString = WebChatString.Replace("/scripts/botchat.js", "https://webchat.botframework.com/scripts/botchat.js");
        var response = new HttpResponseMessage();
        response.Content = new StringContent(WebChatString);
        response.Content.Headers.ContentType = new MediaTypeHeaderValue("text/html");
        var botCred = new MicrosoftAppCredentials(
            ConfigurationManager.AppSettings["MicrosoftAppId"],
            ConfigurationManager.AppSettings["MicrosoftAppPassword"]);
        var stateClient = new StateClient(botCred);
        BotState botState = new BotState(stateClient);
        BotData botData = new BotData(eTag: "*");
        botData.SetProperty<string>("SPAccessToken", SPAccessToken);
        stateClient.BotState.SetUserDataAsync("webchat", userId, botData).Wait();
        return response;
    }

    private Task GetSPTokenSilent()
    {
        return Task.Run(async () =>
        {
            string userObjectID = ClaimsPrincipal.Current.FindFirst("http://schemas.microsoft.com/identity/claims/objectidentifier").Value;
            string signedInUserID = ClaimsPrincipal.Current.FindFirst(ClaimTypes.NameIdentifier).Value;
            AuthenticationContext authContext = new AuthenticationContext(aadInstance + tenantId, new ADALTokenCache(signedInUserID));
            ClientCredential cred = new ClientCredential(clientId, appKey);
            AuthenticationResult res = await authContext.AcquireTokenSilentAsync("https://eyskens.sharepoint.com", cred,
                new UserIdentifier(userObjectID, UserIdentifierType.UniqueId));
            SPAccessToken = res.AccessToken;
        });

    }
}

I’m getting an AccessToken for the SharePoint resource (https://eyskens.sharepoint.com/) and I save it into the BotState using the user identifier as a key. I’m then getting the content of the actual webchat control, passing in the secret and the userid parameters. That is fully explain in my other blog post.

Now, here is the code of my message controller (message endpoint of the BOT):

[BotAuthentication]
public class MessagesController : ApiController
{
    public async Task<HttpResponseMessage> Post([FromBody]Activity activity)
    {
        var cli = new ConnectorClient(new Uri(activity.ServiceUrl));
        var TypingReply = activity.CreateReply();
        TypingReply.Type = ActivityTypes.Typing;
        await cli.Conversations.ReplyToActivityAsync(TypingReply);

        switch (activity.GetActivityType())
        {

            case ActivityTypes.Message:
                await Conversation.SendAsync(activity, () => new DocumentFinderDialog());
                break;

            case ActivityTypes.ConversationUpdate:
                var client = new ConnectorClient(new Uri(activity.ServiceUrl));
                IConversationUpdateActivity update = activity;

                if (update.MembersAdded.Any())
                {
                    var reply = activity.CreateReply();
                    var newMembers = update.MembersAdded?.Where(t => t.Id != activity.Recipient.Id);
                    foreach (var newMember in newMembers)
                    {
                        reply.Text = "Hello I'm here to help you finding documents, tell me what you're looking for";
                        await client.Conversations.ReplyToActivityAsync(reply);
                    }
                }
                break;
            case ActivityTypes.ContactRelationUpdate:
            case ActivityTypes.Typing:
            case ActivityTypes.DeleteUserData:
            case ActivityTypes.Ping:
            default:
                HandleSystemMessage(activity);
                break;
        }

        var response = Request.CreateResponse(HttpStatusCode.OK);
        return response;
    }

    private Activity HandleSystemMessage(Activity message)
    {//I emptied it for sake of brevity
        return null;
    }
}

Here I simply start my LUIS dialog. All the rest is default boiler plate code. Now, the most interesting maybe, the LUIS dialog itself:

[LuisModel("b24fa73c------8bf3d15", "32a0b-------2d")]
[Serializable]
public class DocumentFinderDialog : LuisDialog<object>
{
    const string SPAccessTokenKey = "SPAccessToken";
    const string SPSite = "https://eyskens.sharepoint.com/sites/kb";

    private static readonly Dictionary<string, string> PropertyMappings
        = new Dictionary<string, string>
    {
        { "TypeOfDocument", "kbDocType" },
        { "Software", "kbTopic" }
    };

    [Serializable]
    public class PartialMessage
    {
        public string Text { set; get; }
    }
    private PartialMessage message;

    internal DocumentFinderDialog() { }

    protected override async Task MessageReceived(IDialogContext context,
        IAwaitable<Microsoft.Bot.Connector.IMessageActivity> item)
    {
        var msg = await item;

        if (string.IsNullOrEmpty(context.UserData.Get<string>(SPAccessTokenKey)))
        {
            MicrosoftAppCredentials cred = new MicrosoftAppCredentials(
                ConfigurationManager.AppSettings["MicrosoftAppId"],
                ConfigurationManager.AppSettings["MicrosoftAppPassword"]);
            StateClient stateClient = new StateClient(cred);
            BotState botState = new BotState(stateClient);
            BotData botData = await botState.GetUserDataAsync(msg.ChannelId, msg.From.Id);
            context.UserData.SetValue<string>(SPAccessTokenKey, botData.GetProperty<string>(SPAccessTokenKey));
        }

        this.message = new PartialMessage { Text = msg.Text };
        await base.MessageReceived(context, item);
    }

    [LuisIntent("")]
    [LuisIntent("None")]
    public async Task NoneIntent(IDialogContext context,
        Microsoft.Bot.Builder.Luis.Models.LuisResult result)
    {
        await context.PostAsync("Sorry but I did not understand what you're looking for!");
        context.Wait(MessageReceived);
    }

    [LuisIntent("FindDocumentation")]
    public async Task INeedIntent(IDialogContext context,
        Microsoft.Bot.Builder.Luis.Models.LuisResult result)
    {
        var reply = context.MakeMessage();
        try
        {
            reply.AttachmentLayout = AttachmentLayoutTypes.Carousel;
            reply.Attachments = new List<Microsoft.Bot.Connector.Attachment>();
            StringBuilder query = new StringBuilder();
            bool QueryTransformed = false;
            if (result.Entities.Count > 0)
            {
                QueryTransformed = true;
                foreach (var entity in result.Entities)
                {
                    if (PropertyMappings.ContainsKey(entity.Type))
                    {
                        query.AppendFormat("{0}:'{1}' ", PropertyMappings[entity.Type], entity.Entity);
                    }
                }
            }
            else
            {
                //should replace all special chars
                query.Append(this.message.Text.Replace("?", ""));
            }

            using (ClientContext ctx = new ClientContext(SPSite))
            {
                ctx.AuthenticationMode = ClientAuthenticationMode.Anonymous;
                ctx.ExecutingWebRequest +=
                    delegate (object oSender, WebRequestEventArgs webRequestEventArgs)
                    {
                        webRequestEventArgs.WebRequestExecutor.RequestHeaders["Authorization"] =
                            "Bearer " + context.UserData.Get<string>("SPAccessToken");
                    };
                KeywordQuery kq = new KeywordQuery(ctx);
                kq.QueryText = string.Concat(query.ToString(), " IsDocument:1");
                kq.RowLimit = 5;
                SearchExecutor se = new SearchExecutor(ctx);
                ClientResult<ResultTableCollection> results = se.ExecuteQuery(kq);
                ctx.ExecuteQuery();

                if (results.Value != null && results.Value.Count > 0 && results.Value[0].RowCount > 0)
                {
                    reply.Text += (QueryTransformed == true) ? "I found some interesting reading for you!" : "I found some potential interesting reading for you!";
                    BuildReply(results, reply);
                }
                else
                {
                    if (QueryTransformed)
                    {
                        //fallback with the original message
                        kq.QueryText = string.Concat(this.message.Text.Replace("?", ""), " IsDocument:1");
                        kq.RowLimit = 3;
                        se = new SearchExecutor(ctx);
                        results = se.ExecuteQuery(kq);
                        ctx.ExecuteQuery();
                        if (results.Value != null && results.Value.Count > 0 && results.Value[0].RowCount > 0)
                        {
                            reply.Text += "I found some potential interesting reading for you!";
                            BuildReply(results, reply);
                        }
                        else
                            reply.Text += "I could not find any interesting document!";
                    }
                    else
                        reply.Text += "I could not find any interesting document!";

                }

            }

        }
        catch (Exception ex)
        {
            reply.Text = ex.Message;
        }
        await context.PostAsync(reply);
        context.Wait(MessageReceived);
    }
    void BuildReply(ClientResult<ResultTableCollection> results, IMessageActivity reply)
    {
        foreach (var row in results.Value[0].ResultRows)
        {
            List<CardAction> cardButtons = new List<CardAction>();
            List<CardImage> cardImages = new List<CardImage>();
            string ct = string.Empty;
            string icon = string.Empty;
            switch (row["FileExtension"].ToString())
            {
                case "docx":
                    ct = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
                    icon = "https://cdn2.iconfinder.com/data/icons/metro-ui-icon-set/128/Word_15.png";
                    break;
                case "xlsx":
                    ct = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
                    icon = "https://cdn2.iconfinder.com/data/icons/metro-ui-icon-set/128/Excel_15.png";
                    break;
                case "pptx":
                    ct = "application/vnd.openxmlformats-officedocument.presentationml.presentation";
                    icon = "https://cdn2.iconfinder.com/data/icons/metro-ui-icon-set/128/PowerPoint_15.png";
                    break;
                case "pdf":
                    ct = "application/pdf";
                    icon = "https://cdn4.iconfinder.com/data/icons/CS5/256/ACP_PDF%202_file_document.png";
                    break;

            }
            cardButtons.Add(new CardAction
            {
                Title = "Open",
                Value = (row["ServerRedirectedURL"] != null) ? row["ServerRedirectedURL"].ToString() : row["Path"].ToString(),
                Type = ActionTypes.OpenUrl
            });
            cardImages.Add(new CardImage(url: icon));
            ThumbnailCard tc = new ThumbnailCard();
            tc.Title = (row["Title"] != null) ? row["Title"].ToString() : "Untitled";
            tc.Text = (row["Description"] != null) ? row["Description"].ToString() : string.Empty;
            tc.Images = cardImages;
            tc.Buttons = cardButtons;
            reply.Attachments.Add(tc.ToAttachment());
        }
    }
}

I start by binding my LUIS Dialog class to my model using the LuisModel attribute. I then declare some constants and a dictionary containing the mappings between the LUIS entity types and the SharePoint managed properties. This will serve me later on when generating the SharePoint query after having “understood” the user query.

I then declare a variable of type PartialMessage as I want to perform a second SharePoint query in case the first one (with the managed properties) doesn’t bring any result. I’m overriding the the MessageReceived method in order store the SharePoint AccessToken that was issued by my webchat controller, into the UserData propertybag. I also recuperate the original text entered by the user and save it into my message variable.

The method decorated with the LuisIntent attribute and FindDocumentation intent is triggered whenever a user sends a message that corresponds to this intent. In that method, I’m checking whether LUIS was able to extract entities and I’m generating the SharePoint query based on these. If no entity was extracted, I use the original text as the target query.

I then perform the SharePoint query using CSOM and the user’s AccessToken. If no result is returned, I perform a second query with the original text. Last but not least, the BuildReply method deals with the UI elements of the Bot Framework to display the results. I’m only handling Office & PDF documents here.

Additional features

In this blog post, I focused only on a simple scenario and a very limited set of entities/intents, I leave the rest to your imagination here. However, by leveraging Azure Cognitive Services, some additional features such as keyphrase extraction, concept extraction, etc. could also be handled by such a BOT. AI has never been so easy to manipulate.

Happy Coding!

Advertisements

About Stephane Eyskens

Office 365, Azure PaaS and SharePoint platform expert
This entry was posted in Azure, SharePoint, SharePoint Online and tagged , , , , , . Bookmark the permalink.

4 Responses to Building the nextgen SharePoint search through a BOT and LUIS?

  1. Pingback: LUIS and POS-Tagging, better together to build great bots | Welcome to my blog! Stéphane Eyskens, Office 365 and Azure PaaS Architect

  2. Pingback: SharePoint Bots… Is Clippy Back From The Dead?

  3. kiran says:

    how can bot access sharepoint on premise.. my bot is hosted on azure and it cannot talk to sharepoint on premise

    Like

    • Stephane Eyskens says:

      There are several ways to achieve this. One is to setup Azure AD Proxy app that will allow both remote connections & authentication from online to on-prem worlds. Another way is to simply have a site to site VPN between your online env & your onprem one but this will not solve identity related problems except if you can afford to work with a service account. There are also service bus relays, etc. Plenty of ways to interconnect online & onprem worlds.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s