How to build a chatbot that doesn’t suck

Almost everyone agrees that automated conversations can add value. As a customer, it’s much easier if you can communicate with a company or organization just as you do with your friends and family. No one likes to sit on hold in a telephone queue or wait three days for a response to their email.

Instead, if AI could recognize your question, and resolve it or transfer it to the right person, the added value for both the user and the business is immense. That’s why “going conversational” is the next step forward in the way companies interact and communicate with their customers. However, for a technology that has so much potential, I always wonder why there are so many unsuccessful voice and chatbots?

If you own a computer or a smartphone, then it’s safe to say that you’ve had a conversation with a chatbot that makes you so frustrated, you want to throw your device out the window. Bots aim to make people’s lives easier, not add problems, so why are a majority of people’s interactions with bots so frustrating? The technology is there; it’s the implementation approach that needs updating.

Over the last two and half years, I built more than 50 chatbots for companies from all industries in Europe, Africa, Asia and the Americas. As with anything, there are always common mistakes that I see teams repeat when tackling a new chatbot project. There are four big lessons, that I learned and I promise you that, if you stick to these lessons, you’ll not only build a bot that doesn’t suck, but you’ll build a bot that solves problems for your users and makes them feel heard and understood.
‍

Lesson 1: Anticipate the out of scope

A chatbot project is an AI project, but frequently it’s run like a traditional IT project. The best way to explain the difference in approaches to a new project team is by comparing building a chatbot to building a website.

A website has a predefined scope. The team agrees on the intended users, designs journeys to match their objectives and essentially decides which buttons appear where and what happens when the user clicks. The user’s actions are finite.

Once there’s a V1 of the website, the team tests to make sure everything works cross-device, that the buttons lead where they’re supposed to, etc. They correct any bugs and set the site live!

Now building a chatbot is an entirely different story. When you build a chatbot, you start with a scope. The team agrees on an information architecture, defines the bot’s knowledge and responses, trains the NLP model, etc. However, we can’t define the user journeys, because by giving the user a keyboard, the user’s actions become infinite.

A visual comparison between the defined and finite user interactions in a IT project vs. the infinite user interactions in AI project

A keyboard opens up the possibility for a user to say literally anything and everything. This highlights the beauty and the potential of AI; instead of trying to understand how a website works and falling into a predefined journey, the user defines their own journey and experience with the chatbot.

This means that even though we decide on an initial scope, people can still say whatever they want outside of this scope. And that can create issues that I refer to as false positives and true negatives.
‍

Example 1: False positives

Let’s say someone asks something that’s out of your scope. You can build your happy flows as well as you like, but when someone asks something that’s out of your scope, it can very easily match an intent that you have in scope. Plus, the better and more robust your in-scope NLP model is, the chances increase that your user gets a false positive.

*Example of an NLP model giving a false positive*

Example 2: True negatives

When someone says something that’s out of scope, and the bot correctly doesn’t understand it. This is a true negative, which means the bot doesn’t know how to answer the question and usually responds with some sort of “not understood” sentence that asks the user to rephrase their question.

However, the phrasing isn’t the issue, and rephrasing the question won’t solve the problem, because the question is out of scope and considered a true not understood. The user and the bot become stuck in a “not understood” loop.

*Example of an NLP model handling a true negative*

So, how can you avoid false positives and plan for true negatives? First, create what I refer to as “99-intents.” These are intents that capture categories and intents that aren’t in scope. Their main purpose is to attract out of scope questions and define the correct next steps for the user.

Second, make sure you correctly handle “not understood.” Don’t ask the user to rephrase. Instead, focus on how you can get the user to a solution as quickly as possible. I’ll dive into this more in Lesson 4.
‍

Lesson 2: Organize intents into a hierarchy

Let’s use a banking bot for this lesson. A customer types that they lost their card, and the bot asks which card: debit or credit. The bot needs to know which card the user lost because the process is different depending on the card type. Great! Nothing bad happened yet :)

A chat conversation where the bot asks the user a follow up question to gather enough information to ensure it gives the most accurate response

Now the next user says that they lost their credit card, and again the bot asks which card they lost. The bot correctly recognized the intent, but it’s asking for additional information that the user already gave. This makes the user feel like the bot didn’t properly understand them because no one likes to repeat themselves.

A chat conversation where the bot asks the user a follow up question that the user already answered in their initial message

The third user types that they lost their Amex Gold. The bot follows up by asking which card and the user repeats Amex Gold. The user is now stuck in a loop because this bot only expects debit or credit as an answer and doesn’t realize Amex Gold is a credit card.

A chat conversation where the bot repeatedly asks the user a follow up question that the user already answered in their initial message

To make sure this doesn’t happen, you need to organize your content into a knowledge hierarchy. It goes from very broad to very specific.

The knowledge hierarchy written out from broad to specific for the "lost something" intent

When a user asks a question, we want the bot to give the most specific answer possible. For example, if someone says that they lost their Amex Gold, we don’t want to ask them if they lost their debit or their credit card.

It’s this possibility to immediately narrow in on the specific that makes AI so useful. Not only can users talk the way they normally do, they feel heard and can immediately receive an answer to their specific problem. Whereas, if this was a website, they’d have to click 1, 2, 3, 4, 5, or 6 times to access the Amex Gold information.

So how can you make sure we access the deepest level of information within a given intent? First, group all the similar intents and expressions together. Then, organize them from broad to specific. Finally, use different intents and specific entities for recognition and figuring out how “specific” the answer should be.
‍

Lesson 3: Expect unusual expressions

Keeping the banking theme, let’s start with an example that most banks in Belgium struggle with. We’re making an intent for users requesting a new version of this little device 👇

An illustration of a banking card reader and examples of different ways people try to ask for a new one

What would you call this? Most people would say “card reader.” Some say bank device, pin machine, digipass… In Belgium, people also say ‘bakje’ or ‘kaske’ which means small tray or container. Some people don’t even use a specific word for this, they just say something like “the thing I use to pay online.” When you start interacting with your customers, you realize that everyone expresses themselves differently.

As humans, it’s pretty easy for us to understand “the thing I use to pay online,” because we have context and can quickly figure out what the person means. However, a bot only knows what you teach it, so you need to make sure that all these variations are part of your AI model. This applies to single words, but also to sentences and situations. For example, when someone wants to express that their card is broken, they might say that while getting out of their car, they dropped their card in the door and closed it.

This brings up the next question: how far do you need to go with this? In the above example, do you really want the bot to understand this? If you add this to the AI model, you’ll have to include all kinds of words that have nothing to do with losing a card. And later on, as soon as someone mentions a car or a door, the AI will start thinking of a broken card. That might not be the result that you’re aiming for.

There’s always a tradeoff when adding words and expressions that are unusual and divert from the more straightforward expressions because you don’t want to take it too far. If a human would have trouble understanding what a person really means, there’s no way we can expect a bot to understand.

You should aim to train your NLP model to recognize 90% of the incoming user questions that are in scope. Depending on the bot and scope, you can get up to a 95% recognition rate, but I typically never train above that since the remaining 5% are usually edge cases and exceptions.

So how can you anticipate all the creative ways users will describe an item or situation? It’s all about testing and optimizing. When you first launch your bot, your NLP model will recognize ~70% of incoming, in-scope questions. You’ll need to review the incoming questions, update the NLP model with new expressions, set it live and repeat until you reach 90% recognition. Remember, a chatbot is an AI project that uses confidence scores, which means nothing is ever black and white.
‍

Comparing how people think launching a bot works vs. how actually launching a bot works

‍
Lesson 4: Remove: “Sorry, I don’t know”

Often, when a bot doesn’t understand someone, the reply is “Sorry, I don’t know the answer.”

The problem with that approach is that it doesn’t help the user and it creates a negative experience. For example, when your wifi doesn’t work, you contact your provider via their chatbot. If the bot doesn’t understand what you said and responds with “Sorry, I don’t know,” you now have two problems: your wifi doesn’t work and the bot doesn’t understand you.

The difference in a user's emotions before and after using a bad chatbot

This goes together with what I mentioned earlier about in-scope and out of scope expressions. If someone says something that’s out of scope, the bot needs to realize that and provide an alternative solution.

Going back to the wifi example. Instead of saying I don’t’ know. The bot can direct the user to an FAQ page, provide a link to a video that walks the user through resetting the modem, give the user the customer support phone number, offer to connect the user to a live agent, etc. There are so many ways the bot can help the user get closer to a solution, even if it can’t solve the problem on its own. By helping the user get closer to a solution, the bot adds value, rather than adding frustration.

At Campfire AI, we always develop an elaborate flow of safety nets. That way we make sure we’ve exhausted every possible way to help the user before wrapping up.

A not understood flow that triggers multiple safety nets before wrapping up

‍

Applying these lessons so your next chatbot doesn't suck

I never said it would be easy to build a chatbot that doesn’t suck ;) But, I promise that if you apply these four lessons, you’ll have a chatbot that helps bring your users closer to a solution.
‍
‍

Ivan Westerhof is COO and Managing Partner of Campfire AI

About the author

Ivan accidentally found himself in the world of conversational AI when interviewing for a job at a marketing agency back in 2018. After the interview, the agency called him back and said that he was a perfect fit for the AI Team. From there, the rest is history. In 2020, he co-founded Campfire AI with Alexis Safarikas. As COO and Managing Partner, Ivan is the master of systems and creator of processes.