There is an increasing emphasis on natural user interfaces in modern applications. As systems become more and more complex, it is an understandable desire to hide some of this complexity from the user and enable a dynamic, easy-to-use and intuitive interfacing solution. Examples for this include gesture control, different sensor aided solutions (like motion-, position- and eye tracking) and natural language understanding. In this article, I am going to present a tool for the latter in the form of Microsoft’s LUIS.

Overview

LUIS stands for Language Understanding (also a play on [Natural-]Language User Interface – [N]LUI) and it is a part of the Microsoft Cognitive Services family. It aims to help create models that continuously improve utilizing machine learning.

LUIS aims to collect valuable core information from what the user says, matching it with a predefined intent (assumed goal of the user) and finding entities in it. Developers can then utilize these information in code and act upon them. LUIS can also be integrated with the Azure Bot Service thus enabling the creation of sophisticated bots.

You can visit the US LUIS site here, or the EU site here.

Building a LUIS model

A LUIS model is what the machine learning system will use to try to interpret what the user says and collect further information in natural speech.

Each model has two important parts:

  • Intents: These are the basic building blocks of a model. Each intent corresponds to a specific user intention. LUIS will use machine-learning and weight each intent against what the user has said to select the most fitting intent. Each intent has “utterances” which are sample sentences that a user might say to trigger said intent.
  • Entities: Used like a variable, these capture valuable additional information corresponding contextually to an identified intent.

For example, look at the following natural language sentence:

“I would like to enable dark mode.”

LUIS could match this for an “Enable” intent, with the “dark mode” as an entity, so our application would probably know that the user wants to alter a setting or configuration by turning it on (enable) and that particular option is “dark mode” and so it would change the style of the user interface.


Or LUIS would recognize it as a “Like” intent and our app could issue a like, share and follow on our Facebook page. And I’m sure our application would deserve such a treatment, but if we wouldn’t like to annoy the user, here is what we can do to make sure LUIS nails it most of the time.

Identifying the right intents

First of all, we need to decide upon the intents our model will have. This is a highly important and delicate step as LUIS will weight each and every intent against the user’s words and decide the outcome by scores. The trick is not in the numbers of the intents, but rather the conceptual “distance” between them. What this means is the more different the usual wording of what our user might say for intent A and intent B, the better.


For example, we might have two different services and business logic in our fast food delivery application for ordering a pizza and ordering a hamburger, but when the user will phrase it, it will be something like “I want to order an extra-large pepperoni pizza” and “order a cheeseburger”. At first glance it may be tempting to issue two different intents for pizza-order and hamburger-order, but as your service and app grows and you introduce more foods and variables, these intents will be closer and closer to each other resulting in increasing error rate in LUIS.

It would be better to just have the intent “Order” and other information will be put into entities, like so:

“I want to order an extra-large pepperoni pizza

order a cheeseburger

I will try to give some tips about how to select distinctive intents:

  • Think about intents as actions and entities as variables (additional information) for these actions
  • Base intents around what a typical user would say to achieve a goal and not how your internal data models / services / business logic / architecture is built
  • Follow through your typical control flows and decide where you want to enable the natural language interface (for example, filling out a detailed, long form might not be the best use-case, in contrast to app-wide “OK – Cancel – Retry” utterances)
  • Experiment (for example you can define two intents for a “do” and “don’t” scenario, or you can group them in one, but watch out for negating word as entities – see what works best!)

The “None” intent

Every model comes with a “None” intent by default. This is used by LUIS to group everything that is outside of your application’s domain. If the model wouldn’t have this, LUIS would try to force a valid intent onto every user utterance, which is not the desired behavior. None helps LUIS to signal that it possibly encountered something that was not meant to be a voice command by the user (or that our model needs improvements).

It is considered best practice to have at least 10% of all the example utterances put into the None intent, or the very least one utterance for every other intent. You can put some unrealistic (and funny) utterances here, that doesn’t have the slightest chance of corresponding to any real user scenarios, but it’s best if you try to come up with some examples which are close, but differ significantly from your intents, or something that the user might say, but it can be safely assumed that he/she didn’t meant it as a voice command. E.g.:

  • “do you know why the answer is 42?”
  • “book a flight” (if your app handles book renting and has nothing to do with airplanes)
  • “damn, this app is so good”

Adding entities

The next step in building a model is defining entities. As I mentioned above, entities are much like variables, they add detail for our application to use. An important distinction between entities are whether they are machine-learned from context or not. The latter are defined by the developer for exact or pattern matches and will not be modified by LUIS.

There are a few types of entities in regards to how they function, which are the following:

  • Simple: describes a single concept
  • Composite: represents an object, that has subparts (like in our example above, “extra-large” and “pepperoni” might be a composite entity of describing the parameters of a pizza order, which are size and type)
  • List: a fixed “list of lists”, as in a collection of synonyms (e.g. an entity type of countries might have a fixed set of which countries make sense in our application and each of them could be substituted with their full name or country code)
  • Regex: matches a defined regular expression
  • Pattern: a variable length placeholder that can be used in a pattern’s template utterance to mark where the entity is supposed to be

List, regex and pattern entities are non-machine-learned entities.

Improving performance

As an active machine-learning model, LUIS might require some work to maintain precise functioning. Luckily, the portal offers some useful tools and insights into how our model performs and how to improve upon it.

In the dashboard, we get a summary, a training evaluation which shows how many queries were successfully predicted, how many were unclear or incorrect. The portal also shows the top examples of what these intents were that we can improve on.

An important metric here is called data imbalance, which points out if an intent has significantly fewer example utterances defined, than the rest of the intents. This can lead to imbalanced weights when LUIS tries to match a sentence to an intent.

The portal will also tell us the prediction metrics by intents, highlighting the most problematic ones.

Selecting a problematic intent, we can select detailed view, which offers useful insights. The portal lists the score for each utterance, which represents a calculated value that LUIS predicts. It is compared to the nearest intent score, which is the top scoring intent from the set of all other intents, other than the what we expect. If the difference is below zero, that means LUIS will incorrectly predict utterances very similar to the one in question, and if it is very close it means prediction will be unclear.

Managing your LUIS app

In the manage view, we can administrate our LUIS model and its endpoints. And endpoint is used to query the service which we can do from code. You can set and query basic information about your model here, like the name, description, app ID, etc.

Since LUIS deeply integrates with Microsoft Azure, an Azure resource will be needed to use it. This can be set for different tiers, which you can read more about here. An authoring resource will also be needed for administration functions. It is important that the authoring and prediction resources share the same location (either inside USA or not), because the different LUIS portals will only accept those corresponding.

You can create multiple versions of a LUIS model which can inherit (cloned) from each other. And also select which version is active on the portal and which is published to the endpoint. This makes it easy to test new features, intents and introduce refactors without the fear of breaking something that’s working in live environment.

Integrating LUIS in a .NET environment

LUIS endpoint accepts queries over HTTPS on a specific URL which can be acquired from Manage -> Azure Resources menu. To identify valid and authorized requests it uses keys (as many other Azure resources do) that need to be supplied with the request. The query itself is given as an URL parameter “q”. An endpoint will then process the request and send back the data as JSON. A pretty basic query code would look as such:

using (var client = new HttpClient())
{
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", SUBSC_KEY);
    var response = await client.GetAsync(ENDPOINT_URL + QUERY).ConfigureAwait(false);
    var responseString = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
    var obj = JsonConvert.DeserializeObject<ResultDTO>(responseString); return obj;
}

The most important data that LUIS will return:

  • Query: the same query that we asked from LUIS
  • TopScoringIntent (TSI): this is basically the winner, the one LUIS matched the query to
  • Score of TSI: this tells us how confident LUIS was
  • Entity list: contains each recognized entity and details of it (beginning- and ending character index, type and name of the entity and its resolutions)

Note, that since V3 API, the query, its available parameters and the response format changed in some meaningful ways, for example a V3 JSON response contains all intents with their corresponding score of what LUIS thinks the likelihood of them matching to the query is. Which is great if you want to have more in-code control over the predictions or just more information for detailed logging and error handling. You can read more about the specific result formats for V2 API here and V3 API here.

And basically this is it! Setup a model class or DTO based on the response JSON documentation, send an HTTPS query to the endpoint with the user’s text (or speech-to-text converted oral input) and deserialize the response with a solution of your preference. Note, that there might be some useful repos or ready-made solutions out there for integrating LUIS into .NET (and other)
infrastructures, but I haven’t really checked them out, due to the ease-of-use, demonstrated above, but feel free to experiment with them.

And lastly for this section, I’d like to point out a few tips that might be helpful for coding with LUIS:

  • It is generally a good idea to keep the LUIS service callstack async, since, you know, HTTP communication, latency and such (however I found that LUIS has a pretty low response time with a 40+ intents model).
  • Keep track of how many queries you send per second and per month, based on your price tier and be prepared in code to handle 403 – Forbidden response if you overstep it. This might especially happen when you run unit tests on intents asynchronously or multiple users utilize the same LUIS endpoint trough your backend.
  • LUIS is not all-knowing. Don’t be afraid to utilize other methods as well to interpret natural language, like another similar service run in parallel with LUIS or even some basic in-code, non-learning solutions. The latter can be exact matching for known, well defined and highly frequent utterances (there is no need to make LUIS interpret an “OK” or “cancel” in the 99% of cases, but a “well… umm… yeah, sure” is another case), or pattern matching.
  • However, don’t overdo these alternate solutions if you want LUIS to learn and update its model. Pattern-matching with predefined word lists can catch almost everything a user says and force an intent on it, but will never learn, adapt or use context. Use them as fallbacks, not as substitutes.

Testing

Even though the portal offers so much insight into how our model performs, in some cases, external testing might still be useful. It is a decision made between allocating time in small portions to analyzing and acting upon the portal’s suggestions from time to time or creating a robust test framework in code once, that can be easily expanded to cover new intents and can be run in a few seconds by a single click (okay, a few clicks in VS). But...

I’d recommend doing both of the above, for the following reasons:

  • Setting up a unit test framework once and using it for every new intent or entity added (once our model is fairly mature and adding new intents don’t just happen each day) helps us verify that the model didn’t regressed in prediction accuracy.
  • Using the portal enables us to tackle problematic parts of the model and gather insights which a simple unit test cannot provide, like statistics and data imbalance.

Of course your mileage may vary and you might not need to make LUIS so bulletproof, or your project might not have the time needed for a testing framework, and in this case, just stick with the portal based analysis once in a while.

Anyways here are some good tips and code snippets to make such a framework… work:

public async static Task<ResultDTO> GetValidatedResult(string query)
{
    ResultDTO result = await Utility.GetIntent(text);
    
    TestQueryMatch(text, result);
    ValidateResult(result);
    
    if (VALIDATE_INTENT_SCORE)
        ValidateIntentScore(result);
        
    return result;
}

This functions gets called for all the test queries of an intent. It is only responsible for providing a valid ResultDTO. That means doing basic sanity checks, like the returned query matches the given one, the result is not null, TSI is not null, score is not 0 or TSI is not the “None” intent. Basic stuff, you can easily do with asserts and so I won’t provide those code snippets, just a basic understanding of code flow.

I recommend however to try and experiment with a TSI score threshold. This means that our test FW (and app) could refuse results that are below some TSI score. As you improve your model, you can take this threshold higher. A good baseline is 0.25. And we can make it so that the developer can turn this check off with a single bool flag.

public static void TestIntent(ResultDTO expected, ResultDTO actual, bool testGeneratedEntities = false)
{
    // Testing TSI match
    Assert.AreEqual(expected.TopScoringIntent.Intent, actual.TopScoringIntent.Intent, $"Your error message here for TSI mismatch.");
    
    if (testGeneratedEntities)
    {
        var generatedEntities = Utility.GetEntitiesFromQuery(actual.Query);
        expected.Entities = generatedEntities;
        
        TestEntities(expected, actual);
    }
}

Now comes the semantic validation. Of course we check the TSI against an expected one, but we make it optional to check all entities. This is because defining each entity for each test utterance is much harder and in some cases, unnecessary. However the testing framework supports generating these dynamically. This can be achieved using a static dictionary or in-memory dataset that the developers maintain for key entities.

And so a single unit test becomes this easy:

[TestMethod]
public async Task TurnOn()
{
    ResultDTO expected = Helper.AsResultDTO("TurnOn");
    Helper.TestIntent(expected, await Helper.GetValidatedResult("please turn on the lights in the living room"), true);
    Helper.TestIntent(expected, await Helper.GetValidatedResult("turn on all lights"));
    Helper.TestIntent(expected, await Helper.GetValidatedResult("switch on the lights in the house"));
}

Summary

We’ve seen how LUIS can help implement a natural language interface in our modern application, which is a steady trend nowadays. We’ve also learned how a LUIS model is built and what are the best practices and traps in constructing it.

Safe to say, that if used properly, natural language understanding can really boost the usability factor of our app, but it’s not a magic solution that applies to every case and environment. However, as IT enthusiasts, it’s always exciting to try out something new and I hope your app will find its new friend in LUIS.