It is always dangerous to compare softwares/services from different vendors as benchmarking is rarely exhaustive and can sometimes be subject to interpretation and misunderstanding. On top of that, hardcore fans of vendors might lose their common sense and objectivity as it can quickly turn emotional.
However, I recently had the opportunity to have a demo of Watson from a seasoned IBM consultant which lead me to try out and explore Watson a little further. I’m working with Azure Cognitive Services for more than a year, especially using LUIS and the bot framework to build chatbots. On top of my Azure experience, I have some background in AI & NLP in general as I’ve been involved in multiple initiatives (as for instance a package I wrote on DBPedia Spotlight) for the past 3 years, using neither IBM, neither Microsoft services.
In this post, I want to shed some light on both offerings and try to be as objective as possible. Don’t think I will privilege Microsoft because I’m an MVP as I’m first and foremost an independent consultant, always having eyes wide opened to what other vendors do!
Side note before we get started: both offerings are moving targets, so today’s truth isn’t yesterday’s nor tomorrow’s one, so you should consider this blog post as a snapshot of today’s situation and always re-consider what is being said below as it might have changed by the time you read this blog post.
Ok, clarification made, let’s get started!
Here is step by step, all the things I tested.
Getting started with a testing environment
Here are the pointers to register to trial versions if you want to give it a try yourself:
- Azure Subscription (encompasses the Cognitive Services): https://azure.microsoft.com/en-us/free/
- IBM Bluemix (encompasses Watson): https://console.bluemix.net/registration/
Best offering: IBM, rationale: no credit card required during sign up which is always more comfortable.
Creating an environment
Once you have your subscription active, you can create an environment. This works already differently between Bluemix and Azure. Bluemix requires you to create a container in a region and create your services inside of that container while Azure is more granular since you can associate any service to any location. Azure comes with many more locations than Bluemix (only 5 at the time of writing), >20 for Cognitive Services.
Best offering: Microsoft, rationale: more granular and much more datacenters, meaning that the likelihood of finding a location close to your apps & users is higher. Granularity may be achieved with Bluemix by creating multiple containers and spreading your Watson services over them but that’d require you to use multiple browser tabs to manage them all. In Azure, you can mix all your services spreaded over different locations within the same ‘container’ in a single UI view.
LUIS vs Conversation Service
There is a fundamental difference in the way LUIS and IBM’s Conversation Service were designed. LUIS serves the purpose of resolving user intents and extracting entities (NER) and any kind of application, including chatbots, may use it. The Conversation Service of IBM, as its name indicates, is dedicated to conversations only, while the conversational (dialogs) aspect with Microsoft is handled by the bot framework. Note that IBM’s Conversation Service API allows to send a piece of text and return the matching intent(s) and entities without a conversation context. So, it is technically not impossible to use the Conversation Service outside of a conversation but that’s kind of deviating from its intended purpose.
Creating intents is about the same user experience for both IBM & Microsoft. LUIS is limited to 80 intents while IBM can go up to 2000 intents. Of course, to workaround this LUIS’limitation, one can always use multiple apps for the same chatbot and/or application. However, whatever technology is used, I’m a big advocate of narrowing down the scope of a chatbot for many reasons I have explained in this blog post (in a nutshell: things like polysemy, neosemy, humour, irony are easier to tackle within a limited scope). So, if you concurr with this philosophy, 80 intents is more than enough to build a single chatbot.
Here again, huge difference on the way they handle NER. LUIS is able to extract entities dynamically while Watson performs its extraction via a predefined list of values. So, to be very concrete, if you deal with something like this:
I need help with Azure
Azure could be an entity of type software. In LUIS, you’d define software and you’d add a few utterances this way:
I need help with Azure
I need help with SharePoint
I need help with OneDrive
and you’d tag Azure, SharePoint and OneDrive as instances of the sotware entity type. Next time LUIS sees a pattern like I need help with xxx, he’d extract xxx automatically for you. On top of this dynamic behavior, you can help LUIS recognizing entities by using phrase list features (list of values).
With Watson, you’re required to provide an exhaustive list of possible values for a given entity:
You can define synonyms and enable Fuzzy Matching, meaning the ability to recognize mispelled words (brain vs brian for instance).
While this offers greater accuracy, it is first and foremost a limiting factor because it’s completely static here. There is no so-called intelligence in this way of extracting entities. For one of the bots I wrote, I had to extract incident & request numbers relating to IT services. Of course, these numbers were completely dynamic as they were generated by the ITSM backend. With LUIS, I used an entity of type incident & another one of type request. Then, I trained LUIS the following way:
Status of INC000011111
Status of INC000011112
Status of INC000011113
Status of REQ000011111
by tagging the items in bold to either incident or request. On top of this training, I used a regex feature to tell LUIS that inc|req followed by a number was meaningfull for my business. Capturing the identifier was essential in order to show the status of the incident/request. How do you tackle such scenarios with Watson? I guess, you’ll have to do it with custom code as Watson doesn’t do the job for you. This is yet a truely fundamental difference between the Conversation Service & LUIS.
Both IBM & Microsoft come with system entity types such as recognizing URLs, dates etc. While using such tools makes you gain a lot of time, this is of course not specific to your own business. To me, the custom entity part is way more important as they allow you to adapt your bot to your specific needs.
IBM’s Conversation Service encompasses a dialog UI enabling you to design your dialog with no single line of code for basic stuff:
Dialog entry points are mapped to intents and from each dialog entry, you can start either respond with static answers (text), either use slots to establish a sequence of interactions. As you can see from the above screenshot, there is a notion of context which enables to deal with variables across the whole conversation. While this is better than nothing (MS doesn’t come with such features), it’s still far from being sufficient. To me, that’s absolutely not usable within the enterprise as is. I guess that most companies build chatbots to help users interacting with various systems from a single entry point (the bot itself). In such a context, dialogs will most of the times lead to dynamic answers (not static) with data coming from external sources. So, your chatbots will have to talk to APIs to perform CRUD operations on various external systems. This isn’t handled at all by the dialog UI. There is even no hook capabilities, so this is merely a static list of steps with static text answers, I don’t see how this could be used in a real-world scenario. So, clearly, don’t trust the no code sales speech!
Best offering: ? Here I’m a little more puzzled because while I consider NER as a key feature of NLP and although this is far better handled by LUIS (see test results later in this article), IBM comes with a single integrated experience and it’s rather easy to get started with it.
This one is a no brainer, Microsoft comes with 14 out of the box channels including the ones supported by IBM, while IBM only connects to 4 channels. On top of this, Microsoft offers the DirectLine channel which allows any app to create its own channel if required (as from Skype for business on-prem for instance). This capability should not be underestimated as channels represent a way to integrate your chatbots with other tools used within the enterprise. I have personally tested Skype, Skype for business online, Teams, Web and the DirectLine channels of Microsoft. DirectLine apart, all the other channels require a very small amount of configuration and are live almost immediatly. Of course, channels do not support equally all the bot framework features nor do they offer the exact same user experience but they represent a very powerfull way of integrating bots within you company’s ecosystem.
Best offering: Microsoft
Batch testing of Watson & LUIS against similar models
The above sections were more oriented towards the features exposed by these two products. I wanted to also evaluate how they compare with regards to understanding test data. In order to do so, I built a very basic LUIS model as well as a Watson workspace. Both contain:
- 3 intents (find_expert, greetings, issue)
- 1 entity type (expertize)
For the latter, I used a list of values in Watson and a phrase list feature in LUIS. The exact same utterances have been used to train both LUIS and Watson. For information, here is the exhaustive list of utterances that I injected in the model:
hello hi how are you what's up who can help me I forgot my password I cannot login my ipad is broken I can't login sharepoint is not working I'm looking for an expert who can help me with ITIL I need a SharePoint specialist I'm looking for an expert in onedrive my desktop is damaged I need a COBIT specialist who can help me with COBIT
So, here the purpose is to have a first glance on their respective performance, not to draw definitive conclusions as it’d require a larger amount of both model & test data (which I couldn’t afford using the trial version of Bluemix). So, I really used a very low amount of arbitrary utterances to test both models. These are below:
How are you
How are you today
I hope you are fine
I need an expert in SharePoint
I’m looking for a specialist
who can help me with onedrive
who can help me with sap
I’m looking for a sap specialist
I need assistance with sharepoint
I don’t know bluemix so any help from an expert is welcome
my ipad is broken
my iphone doesn’t work
my android is completely out of use
I lost my password
I cannot login anymore
Utterances are splitted over 3 groups (greetings, find_expert and issue). I used sentences that are slightly different (highlighted in orange color) than the ones I used to train both LUIS & Watson. I must precise that I trained them both only once.
So, I sent these utterances to the APIs and I compiled the results in Excel. Here are the results of IBM:
as you probably noticed yourself, lines in red are faulting occurrences. So, we see 5 issues here with 3 kind of problems:
- intent not recognized
- entity not extracted
strangly enough, Watson doesn’t resolve Good night & good afternoon to the greetings intent but what I depicted earlier is very visible in these results: Watson correctly resolved “who can help me with SAP” to the find_expert intent but was unable to detect SAP as an entity because SAP isn’t listed in its list of values. LUIS (see below) identifies SAP although it’s not part of its phrase list feature. To me, that’s a key difference.
Here are the results of LUIS
as you can see, only two mistakes (instead of 5 for Watson): “I hope you are fine” is resolved to issue although that should be greetings and “I’m looking for a sap specialist” is resolved to find_expert (correct) but sap isn’t extracted this time. As you can see, in “who can help me with sap”, LUIS correctly extracts SAP although it’s not listed in its phrase list feature. It’s able to do so because of the utterance patterns (who can help me with …) and the frequence of those patterns in the model data. Fixing intent resolution faults is equally handled by both IBM & MS, by merely readjusting using the active learning feature.
Azure Bot Service
Instead of creating your bot from scratch, the Azure Bot Service comes with the following advantages:
- Automatic registration of your bot and its message endpoint
- Automatic association with LUIS
- Continuous integration configurable from scratch
- A single integrated UI to build, configure, test and view analytics of your bot
- The necessary boilerplate code to get started with a minimal bot
It’s still a preview feature (as many other things discussed in this article), but it surely deserves to be followed up as it looks very promising.
We’ve talked a lot about features so far but what about more technical things? As stated earlier, Microsoft handles conversational aspects with the Bot Framework. IBM handles this within its dialog UI but if your bot has to perform concrete actions such as processing an order for instance, you’ll of course need code as well. Microsoft’s bot framework comes in two flavours: Node.js and .NET.
IBM comes with SDKs for many more languages: Node.js, Java, Swift, Python, Unity and even .NET core which helped me getting some grasp on Watson. All the SDKs from both vendors are open source.
Both IBM & Microsoft have GitHub repositories with code samples and the code of the SDKs.
- LUIS is more dynamic than IBM’s Conversation Service for the reasons depicted earlier regarding how they extract entities. NER is a key process in NLP
- LUIS is much more ML oriented than Watson as LUIS is really analyzing utterance patterns to perform dynamic entity extraction.
- The Conversation Service offers a dialogue UI which isn’t yet enterprise ready but Microsoft doesn’t offer any dialog UI although this is coming.
- The bot framework has more out of the box channels than IBM.
If you want to build simple & static chatbots, Watson is way easier than Microsoft as it encompasses everything within the conversation service but when it gets to enterprise-ready chatbots, I’d rather consider Microsoft for the following key aspects:
- Amount of out of the box channels your bots can be accessed from (3 times more channels for Microsoft)
- Security: this heavily depends on your ecosystem but a lot of companies, including non-Microsoft-friendly ones run Active Directory as their user repository. Active Directory is more and more often synched to Azure Active Directory which is Microsoft’s identity corner stone in any hybrid/cloud scenario. This is key because within the enterprise, many chatbots will probably have to identify users interacting with them. Being in the MS stack clearly simplifies integration with Azure Active Directory. Of course, if your company isn’t using Active Directory nor Azure Active Directory at all, you simply ignore this :).