IBM Watson’s Conversation Service vs LUIS+Bot framework to build chatbots


[Update 09/2017: learn how to bring huge capabilities to your bot by watching my free Channel9 course]

It is always dangerous to compare softwares/services from different vendors as benchmarking is rarely exhaustive and can sometimes be subject to interpretation and misunderstanding. On top of that, hardcore fans of vendors might lose their common sense and objectivity as it can quickly turn emotional.

However, I recently had the opportunity to have a demo of Watson from a seasoned IBM consultant which lead me to try out and explore Watson a little further. I’m working with Azure Cognitive Services for more than a year, especially using LUIS and the bot framework to build chatbots. On top of my Azure experience, I have some background in AI & NLP in general as I’ve been involved in multiple initiatives (as for instance a package I wrote on DBPedia Spotlight) for the past 3 years, using neither IBM, neither Microsoft services. 

In this post, I want to shed some light on both offerings and try to be as objective as possible. Don’t think I will privilege Microsoft because I’m an MVP as I’m first and foremost an independent consultant, always having eyes wide opened to what other vendors do!

Side note before we get started: both offerings are moving targets, so today’s truth isn’t yesterday’s nor tomorrow’s one, so you should consider this blog post as a snapshot of today’s situation and always re-consider what is being said below as it might have changed by the time you read this blog post.

Ok, clarification made, let’s get started!

Here is step by step, all the things I tested.

Getting started with a testing environment

Here are the pointers to register to trial versions if you want to give it a try yourself:

  • Azure Subscription (encompasses the Cognitive Services):
  • IBM Bluemix (encompasses Watson):

Best offering: IBM, rationale: no credit card required during sign up which is always more comfortable.

Creating an environment

Once you have your subscription active, you can create an environment. This works already differently between Bluemix and Azure.  Bluemix requires you to create a container in a region and create your services inside of that container while Azure is more granular since you can associate any service to any location. Azure comes with many more locations than Bluemix (only 5 at the time of writing), >20 for Cognitive Services.

Best offering: Microsoft, rationale: more granular and much more datacenters, meaning that the likelihood of finding a location close to your apps & users is higher. Granularity may be achieved with Bluemix by creating multiple containers and spreading your Watson services over them but that’d require you to use multiple browser tabs to manage them all. In Azure, you can mix all your services spreaded over different locations within the same ‘container’ in a single UI view.

LUIS vs Conversation Service

There is a fundamental difference in the way LUIS and IBM’s Conversation Service were designed. LUIS serves the purpose of resolving user intents and extracting entities (NER) and any kind of application, including chatbots, may use it. The Conversation Service of IBM, as its name indicates, is dedicated to conversations only, while the conversational (dialogs) aspect with Microsoft is handled by the bot framework. Note that IBM’s Conversation Service API allows to send a piece of text and return the matching intent(s) and entities without a conversation context. So, it is technically not impossible to use the Conversation Service outside of a conversation but that’s kind of deviating from its intended purpose.


Creating intents is about the same user experience for both IBM & Microsoft. LUIS is limited to 80 intents while IBM can go up to 2000 intents. Of course, to workaround this LUIS’limitation, one can always use multiple apps for the same chatbot and/or application. However, whatever technology is used, I’m a big advocate of narrowing down the scope of a chatbot for many reasons I have explained in this blog post (in a nutshell: things like polysemy, neosemy, humour, irony are easier to tackle within a limited scope). So, if you concurr with this philosophy, 80 intents is more than enough to build a single chatbot.

Entity extraction

Here again, huge difference on the way they handle NER. LUIS is able to extract entities dynamically while Watson performs its extraction via a predefined list of values. So, to be very concrete, if you deal with something like this:

I need help with Azure

Azure could be an entity of type software. In LUIS, you’d define software and you’d add a few utterances this way:

I need help with Azure

I need help with SharePoint

I need help with OneDrive

and you’d tag Azure, SharePoint and OneDrive as instances of the sotware entity type. Next time LUIS sees a pattern like I need help with xxx, he’d extract xxx automatically for you. On top of this dynamic behavior, you can help LUIS recognizing entities by using phrase list features (list of values).

With Watson, you’re required to provide an exhaustive list of possible values for a given entity:


You can define synonyms and enable Fuzzy Matching, meaning the ability to recognize mispelled words (brain vs brian for instance).

While this offers greater accuracy, it is first and foremost a limiting factor because it’s completely static here. There is no so-called intelligence in this way of extracting entities. For one of the bots I wrote, I had to extract incident & request numbers relating to IT services. Of course, these numbers were completely dynamic as they were generated by the ITSM backend. With LUIS, I used an entity of type incident & another one of type request. Then, I trained LUIS the following way:

Status of INC000011111
Status of INC000011112
Status of INC000011113
Status of REQ000011111

by tagging the items in bold to either incident or request. On top of this training, I used a regex feature to tell LUIS that inc|req followed by a number was meaningfull for my business. Capturing the identifier was essential in order to show the status of the incident/request. How do you tackle such scenarios with Watson? I guess, you’ll have to do it with custom code as Watson doesn’t do the job for you. This is yet a truely fundamental difference between the Conversation Service & LUIS.

System Entities

Both IBM & Microsoft come with system entity types such as recognizing URLs, dates etc. While using such tools makes you gain a lot of time, this is of course not specific to your own business. To me, the custom entity part is way more important as they allow you to adapt your bot to your specific needs.


IBM’s Conversation Service encompasses a dialog UI enabling you to design your dialog with no single line of code for basic stuff:


Dialog entry points are mapped to intents and from each dialog entry, you can start either respond with static answers (text), either use slots to establish a sequence of interactions. As you can see from the above screenshot, there is a notion of context which enables to deal with variables across the whole conversation. While this is better than nothing (MS doesn’t come with such features), it’s still far from being sufficient. To me, that’s absolutely not usable within the enterprise as is. I guess that most companies build chatbots to help users interacting with various systems from a single entry point (the bot itself). In such a context, dialogs will most of the times lead to dynamic answers (not static) with data coming from external sources. So, your chatbots will have to talk to APIs to perform CRUD operations on various external systems. This isn’t handled at all by the dialog UI. There is even no hook capabilities, so this is merely a static list of steps with static text answers, I don’t see how this could be used in a real-world scenario. So, clearly, don’t trust the no code sales speech!

Best offering: ? Here I’m a little more puzzled because while I consider NER as a key feature of NLP and although this is far better handled by LUIS (see test results later in this article), IBM comes with a single integrated experience and it’s rather easy to get started with it.


This one is a no brainer, Microsoft comes with 14 out of the box channels including the ones supported by IBM, while IBM only connects to 4 channels. On top of this, Microsoft offers the DirectLine channel which allows any app to create its own channel if required (as from Skype for business on-prem for instance). This capability should not be underestimated as channels represent a way to integrate your chatbots with other tools used within the enterprise. I have personally tested Skype, Skype for business online, Teams, Web and the DirectLine channels of Microsoft. DirectLine apart, all the other channels require a very small amount of configuration and are live almost immediatly. Of course, channels do not support equally all the bot framework features nor do they offer the exact same user experience but they represent a very powerfull way of integrating bots within you company’s ecosystem.

Best offering: Microsoft

Batch testing of Watson & LUIS against similar models

The above sections were more oriented towards the features exposed by these two products. I wanted to also evaluate how they compare with regards to understanding test data. In order to do so, I built a very basic LUIS model as well as a Watson workspace. Both contain:

  • 3 intents (find_expert, greetings, issue)
  • 1 entity type (expertize)

For the latter, I used a list of values in Watson and a phrase list feature in LUIS. The exact same utterances have been used to train both LUIS and Watson.  For information, here is the exhaustive list of utterances that I injected in the model:

how are you
what's up
who can help me
I forgot my password
I cannot login
my ipad is broken
I can't login
sharepoint is not working
I'm looking for an expert
who can help me with ITIL
I need a SharePoint specialist
I'm looking for an expert in onedrive
my desktop is damaged
I need a COBIT specialist
who can help me with COBIT

So, here the purpose is to have a first glance on their respective performance, not to draw definitive conclusions as it’d require a larger amount of both model & test data (which I couldn’t afford using the trial version of Bluemix). So, I really used a very low amount of arbitrary utterances to test both models. These are below:

Good morning
Good afternoon
How are you
How are you today
I hope you are fine
Good evening
Good night

I need an expert in SharePoint
I’m looking for a specialist
who can help me with onedrive
who can help me with sap
I’m looking for a sap specialist
I need assistance with sharepoint
I don’t know bluemix so any help from an expert is welcome

my ipad is broken
my iphone doesn’t work
my android is completely out of use
I lost my password
I cannot login anymore

Utterances are splitted over 3 groups (greetings, find_expert and issue). I used sentences that are slightly different (highlighted in orange color) than the ones I used to train both LUIS & Watson. I must precise that I trained them both only once.

So, I sent these utterances to the APIs and I compiled the results in Excel. Here are the results of IBM:


as you probably noticed yourself, lines in red are faulting occurrences. So, we see 5 issues here with 3 kind of problems:

  • intent not recognized
  • entity not extracted
  • both

strangly enough, Watson doesn’t resolve Good night & good afternoon to the greetings intent but what I depicted earlier is very visible in these results: Watson correctly resolved “who can help me with SAP” to the find_expert intent but was unable to detect SAP as an entity because SAP isn’t listed in its list of values. LUIS (see below) identifies SAP although it’s not part of its phrase list feature. To me, that’s a key difference.

Here are the results of LUIS


as you can see, only two mistakes (instead of 5 for Watson): “I hope you are fine” is resolved to issue although that should be greetings and “I’m looking for a sap specialist” is resolved to find_expert (correct) but sap isn’t extracted this time. As you can see, in “who can help me with sap”, LUIS correctly extracts SAP although it’s not listed in its phrase list feature. It’s able to do so because of the utterance patterns (who can help me with …) and the frequence of those patterns in the model data. Fixing intent resolution faults is equally handled by both IBM & MS, by merely readjusting using the active learning feature.

Azure Bot Service

Instead of creating your bot from scratch, the Azure Bot Service comes with the following advantages:

  • Automatic registration of your bot and its message endpoint
  • Automatic association with LUIS
  • Continuous integration configurable from scratch
  • A single integrated UI to build, configure, test and view analytics of your bot
  • The necessary boilerplate code to get started with a minimal bot

It’s still a preview feature (as many other things discussed in this article), but it surely deserves to be followed up as it looks very promising.

Technical bits

We’ve talked a lot about features so far but what about more technical things? As stated earlier, Microsoft handles conversational aspects with the Bot Framework. IBM handles this within its dialog UI but if your bot has to perform concrete actions such as processing an order for instance, you’ll of course need code as well. Microsoft’s bot framework comes in two flavours: Node.js and .NET.

IBM comes with SDKs for many more languages: Node.js, Java, Swift, Python, Unity and even .NET core which helped me getting some grasp on Watson.  All the SDKs from both vendors are open source.

Both IBM & Microsoft have GitHub repositories with code samples and the code of the SDKs.

Bottom line

  • LUIS is more dynamic than IBM’s Conversation Service for the reasons depicted earlier regarding how they extract entities. NER is a key process in NLP
  • LUIS is much more ML oriented than Watson as LUIS is really analyzing utterance patterns to perform dynamic entity extraction.
  • The Conversation Service offers a dialogue UI which isn’t yet enterprise ready but Microsoft doesn’t offer any dialog UI although this is coming.
  • The bot framework has more out of the box channels than IBM.

If you want to build simple & static chatbots, Watson is way easier than Microsoft as it encompasses everything within the conversation service but when it gets to enterprise-ready chatbots, I’d rather consider Microsoft for the following key aspects:

  • Amount of out of the box channels your bots can be accessed from (3 times more channels for Microsoft)
  • Security: this heavily depends on your ecosystem but a lot of companies, including non-Microsoft-friendly ones run Active Directory as their user repository. Active Directory is more and more often synched to Azure Active Directory which is Microsoft’s identity corner stone in any hybrid/cloud scenario. This is key because within the enterprise, many chatbots will probably have to identify users interacting with them. Being in the MS stack clearly simplifies integration with Azure Active Directory. Of course, if your company isn’t using Active Directory nor Azure Active Directory at all, you simply ignore this :).

PS: in the meantime, I managed to record some videos on Azure Cognitive Services, so feel free to watch them.

Happy AI!

About Stephane Eyskens

Office 365, Azure PaaS and SharePoint platform expert
This entry was posted in Azure, Azure Cognitive Services, NLP and tagged , , , , , . Bookmark the permalink.

24 Responses to IBM Watson’s Conversation Service vs LUIS+Bot framework to build chatbots

  1. tyayers says:

    Reblogged this on REFRESHED_WORDS and commented:
    Wow, great comparison!

    Liked by 1 person

  2. Vincent Perrin says:

    And leveraging both world BotFramework and Watson Conversation is an approach to embrace openess :

    Liked by 1 person

    • Stephane Eyskens says:

      Hello Vincent, thanks for your comment. All these services are nothing but REST APIs, you can of course also consume LUIS from any IBM component. These services are technology agnostic in terms of consumption.


  3. Vincent Perrin says:

    Watson Developer Cloud provides other services for NLP like Watson Natural Language Understanding & Language Classifier. Both can be used as standalone services or linked to Watson Conversation to improve Intent & Entity detection.

    Liked by 1 person

    • Stephane Eyskens says:

      Hello Vincent, thanks for your comment! I know that Watson encompasses other services as Azure Cognitive Services do. However, I was comparing those two only in the context of a chatbot. However, having to combine both the Conversation Service and the Natural Language Understanding is creating an extra hop while it should have been combined out of the box into the Conversation Service IMHO. If a Conversation Service, which is by design dealing with natural language doesn’t do it on its own, it’s a bit a pity.


      • Zoki says:

        Totally unfair to say MS comes with better NLP if you did not mention Watson NLU service which even supports building your own models for NER using Knowledge Studio (no coding required). If you think it should be part od WCS, then you have to exclude LUIS and only use Chatbot Framework. In summary, include all the tools that a vendor provided to do a certain job then do a comparison.

        Liked by 1 person

      • Stephane Eyskens says:


        Please read this article carefully before commenting! In this blog post, I only compared LUIS & IBM Watson’s Conversation Service. It’s in no way, a full comparison of Watson & the Cognitive Services.
        By the way, the Cognitive Services also come with Powerfull NLP stuff besides LUIS.

        Best Regards


  4. Guilherme says:

    I have practical experience with both solutions and I’d like to point some things out, but first consider this:
    – 430+ intents
    – pt-BR
    – Chatbot for any users that enter the client’s portal
    Now, these are the problems we’re experiencing:
    1. In Watson we got an accuracy of 94% which we didn’t reach with Luis, so far. The same user sentences gives bizarre behaviour in Luis (such as your example with “i hope you are fine”), but it happens too often.
    2. BotFramework + DirectLine has a ~1.5s latency in US, if we try to use any other datacenter, we got up to ~5s latency. (Calling Luis directly is very fast, ~100-200ms)
    3. Using an application we developed (in this case, using Java) we got a performance of up to 200ms over the NLP. In other words, Our App + Luis = ~500ms, against 1.5s in BotFramework
    4. I thought your observations about the Conversation Dialog were very poor. We use our own app to integrate with other services on top of the Conversation Dialog, but we have some non-tech people in the team that can maintain the flows of conversation through Watson. Using BotFramework, a developer has to be involved.
    5. Our App + Watson Conversation = ~700ms response time

    I really liked your observations about NER though. I will test it more extensively in Watson NLU and Luis!

    Liked by 1 person

    • Stephane Eyskens says:

      My observations are what they are, the methodology I used is cristal clear, nothing more nothing less. Instead of simply stating that it’s poor, I invite you to write a similar blog post, describing th exact methodology, providing your 430 sentences+ (did you train LUIS by the way??) and push some code so that everybody can take his own metrics. What I state in the blog post can be reproduced by anyone.

      Liked by 2 people

  5. har says:

    Nice artilce


  6. ericranesakbar says:

    Great post, its inspired me. can you visit our website Qiscus. Don’t forget to read our article about and please give us some comment.


  7. Preetish Chindarkar says:

    I read your full article but as you are forcing LUIS much more i think you are Microsoft fan… my personal experience with LUIS is worst. and the other thing is LUIS still in beta version and only god knows when is will come Live. and no conversation context functionality is there in LUIS (which is most important for chat bots). Although Microsoft have show in their bot demo that the LUIS recalls the past history of conversation which is not their in real scenario. and I don’t know about your test but I personally found LUIS week in intent matching and not providing service like Conversation Context, Actions, Dialogflow, ect….but I am wondering you still supporting LUIS


    • Stephane Eyskens says:

      Not only I fully disagree but I should tell you that since this article, I wrote a couple of extra chatbots that work like a charm. So, the main point of my article is that LUIS is able to tackle dynamic entities, not the Conversation Service and that is key. I’ve never seen any scenario in the real-world that requires only static list of entities, really never! So, if the chatbots you work on are simply FAQs, then you don’t even need to detect an intent or an entity…


      • Fred says:

        Only my two cents.
        I think it`s not necessary NER to build conversation services to the most enterprises (micro and small business). Small business have entities very well defined, and besides the entities are fews in number. It`s my case. I have three ‟little” projects in my hands and I don’t need of the NER resource. They need context control resource.
        But your article is very good.
        I think the big problem actually is to do a convincing conversation as a personal assistant (the user or client wants this) for more simple a chatbot can be.


    • Stephane Eyskens says:

      Also, I forgot to add one thing: what I depicted in this post is completely repeatable, feel free to perform the same tests. It’s not a question of being fan or not, it’s observations based on real tests. I don’t pretend to have made an exhaustive list of tests but at least, what I did is clear, feel free to come up with another set of tests that shows a better performance of the Conversation Service!


      • teknagu says:

        We embedded regex for recognizing entities and they work like a charm in Watson. U may be write that its still coding but open in json editor embedded in watson , add this one line and go back to visual UI.


  8. Hello, first clarify that LUIS not only accepts 80 intentions but 500 per application.
    Apart is not much to have an acceptance of too many intentions because the service can get confused with many phrases, so Microsoft recommends grouping the intentions into similar categories and separate them into different applications so that the service has a better understanding of the language model. regards


    • Stephane Eyskens says:


      Thanks for your comment. Indeed, LUIS boundaries have moved since I wrote this post. That’s why, whenever I write a blog post, I often say “at the time of writing…” because I can’t keep updating all my posts whenever something changes. As you know, with PaaS offering, things change almost on a weekly basis 🙂


  9. Daniel Schind says:

    Transforming a 80 intent limit (LUIS) vs 2000 intent (Watson Conversation Service) into and “advantage” for LUIS and saying “Don’t think I will privilege Microsoft because I’m an MVP” can’t be in the same post. I have experience building successful bots for several customers and you can easily go over 200+ intents and even more depending on the use case.
    One of the problems of being biased is that you are not always aware of it


    • Stephane Eyskens says:

      I invite you to write a similar blog post to show how better is Watson, by using concrete examples and a clear methodology.


      • Stephane Eyskens says:

        By the way, as highlighted by a visitor, LUIS now accepts up to 500 intents, not 80 anymore and LUIS also comes with a conversation designer but I don’t update the post whenever something changes or I would never stop. If I was that biased, I guess I’d rush to highlight any progress made by LUIS…

        Liked by 1 person

  10. Watson’s Dialog UI now does hook into anything you like to do API calls (be it your own app, or an IBM Cloud Function):

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s