Introducing GRACE: GPT Reprogrammable Assistant with Code Execution

Fri, Jan 27, 2023 17-minute read

TL;DR: GRACE (GPT Reprogrammable Assistant with Code Execution) leverages GPT-3 to implement a human-like chatbot that’s capable of retrieving information from a knowledge base and processing the customers’ requests via an API backend. It showcases how large language models can be made to interact with external systems for knowledge retrieval and performing actions in the real world. Check out the project’s GitHub repository.

I’ve briefly dealt with chatbots in my NLP work, and it’s always been an exercise in frustration. Programming a bot is a tedious undertaking, and the resulting user experience often makes you wonder if it was all worth it in the first place.

Specifically, Rasa (the system I’ve had some experience with - I imagine others are similar) starts with identifying the intent of the user’s utterance. It then returns a response, along with potentially executing an action of some sort - like issuing an API request to cancel the user’s booking. What exactly the chatbot will do is determined by a combination of explicit rules and the conversation examples (“stories”) that you train the bot on.

Developing anything but a trivial chatbot with this approach requires a lot of data. First, you need to identify all important intents that the customer may have, such as saying hello, canceling order, idle chitchat and many others. For each intent, you need to compile example utterances, so that the chatbot can pigeonhole the incoming responses into correct intents. You then need to write the rules and/or compile a bunch of stories to let the bot learn how to respond at any given point in the conversation. You also need to write all the possible utterances that your chatbot will ever output. And last but not least, you have to write the code for the actions that the bot will be executing.

Rasa makes parts of this process easier with interactive tools like rasa interactive, but the underlying limitations remain:

The chatbot is limited to predefined responses
Changing the conversation flow requires changing the rules and stories, of which there can be many
The bot is still a rather rigid program under the hood, and this fact quickly becomes evident to the user.

Putting Rasa aside, there are now models like GPT-3 and ChatGPT, which display amazing language capabilities and are able to follow fairly sophisticated instructions. However, most of the examples you see for those models revolve around simple request-response scenarios like “Explain quantum mechanics to me like I’m 5 years old” or “Write a blog post on office politics”. When you do encounter chat-related examples, those are firmly set in a fantasy land, like “Marv is a chatbot that reluctantly answers questions with sarcastic responses” - fun to play with but hardly anything useful.

Can we leverage the breathtaking language mastery of GPT-3/ChatGPT to create a human-like chatbot that’s grounded in reality of a particular business and can do useful things like processing the customers’ requests?

Turns out, we can.

A Simple Chatbot with OpenAI API

First things first: OpenAI doesn’t yet offer API access to ChatGPT, so we’ll be working with GPT-3 instead. Not to fear though, the latest GPT-3 model (text-davinci-003) is extremely capable and does not fall too far behind ChatGPT. What’s important for us, too, is that text-davinci-003 is really good at following instructions and working with code.

Before we dive in, though, we need to get a basic understanding of how GPT-3 actually works. GPT-3 is a language model. As such, what it does is basically predict the next word in a piece of text. Do it repeatedly, and you have a system that can write texts of pretty much arbitrary length. This is exactly what GPT-3 does, and does it well.

To see how it works, let’s use OpenAI API to write a continuation of a sentence.

import openai

openai.api_key = "sk-..."

completion = openai.Completion.create(
    engine="text-davinci-003",
    prompt="A large black cat crossed the road."
)

print(completion.choices[0].text)

When you run this Python code, you may get something like the following:

The cat could have been crossing the road for a variety of reasons.

So far, so good. But how can we use this “autocomplete on steroids” to make a chatbot? How about this: GPT-3 and the human will take turns adding text to a transcript of their chat session. GPT-3 will be completing the text when it’s the bot’s turn to speak, and the human will be entering their utterances when it’s theirs.

All this may sound a bit convoluted, but in reality, the approach is quite simple. We start with the initial prompt containing the basic instructions for GPT-3:

You are an AI assistant for Death Star, a Star Wars-themed restaurant in Cupertino, CA. You chat with customers to help process their requests. For example:

Customer: I'd like to cancel my booking.
AI: Sure, do you have the reference number?

A transcript of your chat session with a customer follows.

AI:

There are a couple of things to note about this prompt. First, it provides the bot with an identity, which, according to OpenAI API docs, is not strictly necessary but is usually helpful. Second, it supplies clear instructions and illustrates them with an example: an excerpt from an actual conversation. All of these help us ground GPT-3 in what we’re expecting from it, including how a conversation is formatted.

(The prompt used in GRACE is more complicated than that, but the basic idea is the same.)

Now throw this at GPT-3, and it will generate a completion:

Hi there, welcome to Death Star! How can I help you today?

Let’s now assume that the customer types in:

Hi, I'd like to book a table.

To get the next response from the bot, we append the new utterances to the prompt and ask GPT-3 for a completion for the updated version:

You are an AI assistant for Death Star, a Star Wars-themed restaurant in Cupertino, CA. You chat with customers to help process their requests. For example:

Customer: I'd like to cancel my booking.
AI: Sure, do you have the reference number?

A transcript of your chat session with a customer follows.

AI: Hi there, welcome to Death Star! How can I help you today?
Customer: Hi, I'd like to book a table.
AI:

Now GPT-3 will generate this:

Great! How many people will be joining you?

Rinse and repeat, and you may end up with a conversation like this one:

AI: Hi there, welcome to Death Star! How can I help you today?
Customer: Hi, I'd like to book a table.
AI: Great! How many people will be joining you?
Customer: There'll be 3 of us.
AI: Perfect, what day and time would you like to book for?
Customer: 5:30 pm tomorrow please.
AI: Okay, 5:30 pm tomorrow it is. Do you have any special requests?
Customer: Yes, one of us is on a gluten-free diet.
AI: No problem, we offer gluten-free options. Is there anything else I can help you with?
Customer: That's it, thanks. Do I get a booking reference?
AI: Yes, your booking reference is 8HG9J2. Please bring it with you when you come in tomorrow. Is there anything else I can help you with?
Customer: No, thanks. Bye!
AI: Alright, see you tomorrow! Have a great day!

The bot’s fluency is amazing, but don’t forget that most of what you’re looking at here is just extremely convincing bullshit. GPT-3 knows nothing about our imaginary restaurant, or its business logic for booking tables. Likewise, no actual reservation takes place and the reference number that the bot provides the customer with is pure fantasy.

Connecting the Chatbot to an API Backend

What we’ve done so far is pretty old news - the conversation use case is well-covered in OpenAI API docs, and there are even a few playground examples (one, two) you can explore.

What’s way more interesting is how we can hook this thing up to an API backend and make it actually process the customer’s requests.

Remember that text-davinci-003, apart from being a huge language model, is also very good at following instructions and understanding code. We can leverage both of these traits to teach our bot how to send programmatic JSON commands to our backend. Here’s an example prompt:

You are an AI assistant for Death Star, a Star Wars-themed restaurant in Cupertino, CA. You chat with customers to help process their requests. For example:

Customer: I'd like to get the details of my booking, reference HTNABL.
AI: Sure, give me a second. [json]{"command": "get_booking_details", "params": {"reference": "HTNABL"}}[/json]
Backend: (To AI) Found booking: Andrew Wallace, 3 people at 8:15 PM on May 22, 2023

The following Python commands are available to you:

- cancel_booking(reference: str) - cancels a booking. Example JSON: [json]{"command": "cancel_booking", "params": {"reference": "Z9B23B"}}[/json]

A transcript of your chat session with a customer follows.

AI:

Note how we provide the bot with example command invocations and introduce the third participant to the chat: the backend. In full adherence to Shakespeare-style playwriting, the backend addresses AI on the side; my experiments have shown that it actually helps GPT-3 treat the backend responses as invisible to the customer.

By now, you may have a question: GPT-3 can easily write verbatim Python code to call the cancel_booking() function directly, so why bother with JSON? It’s a security measure - we, of course, know better than to eval() any code that a language model may throw at us.

Now that we have the new prompt, let’s see what happens when we start a chat session with it.

AI: Hi there, how can I help you?
Customer: Hello, I'd like to cancel my booking please.
AI: Sure thing. What's the reference number for your booking?
Customer: It's BHN23G.
AI: Got it. Let me take a look. [json]{"command": "cancel_booking", "params": {"reference": "BHN23G"}}[/json]

For starters, it’s pretty crazy how GPT-3 is able to extrapolate the backend command it will be invoking, and ask the customer for the data (the reference number) required for its execution.

This fluency aside, though, I hope you can see where the things are going. We can now parse the bot’s responses and check if there’s a JSON command for us to process. If there is, we call the underlying Python function and append to the prompt its human- (bot-?) readable result as the backend’s utterance. Note how the bot seamlessly picks up the result and acts accordingly:

AI: Hi there, how can I help you?
Customer: Hello, I'd like to cancel my booking please.
AI: Sure thing. What's the reference number for your booking?
Customer: It's BHN23G.
AI: Got it. Let me take a look. [json]{"command": "cancel_booking", "params": {"reference": "BHN23G"}}[/json]
Backend: (To AI) The booking is canceled
AI: Ok, I've successfully canceled your booking. Is there anything else I can do for you?
Customer: That's it, thanks!
AI: You're welcome, have a great day!

Of course, our UI does not show the JSON commands and backend responses to the customer, so to the human, the conversation looks like this:

AI: Hi there, how can I help you?
Customer: Hello, I'd like to cancel my booking please.
AI: Sure thing. What's the reference number for your booking?
Customer: It's BHN23G.
AI: Got it. Let me take a look.
AI: Ok, I've successfully canceled your booking. Is there anything else I can do for you?
Customer: That's it, thanks!
AI: You're welcome, have a great day!

What I find really impressive (among many other things) is that GPT-3 can intelligently combine the commands to achieve the requested result. Check this out:

You are an AI assistant for Death Star, a Star Wars-themed restaurant in Cupertino, CA. You chat with customers to help process their requests. For example:

Customer: I'd like to get the details of my booking, reference HTNABL.
AI: Sure, give me a second. [json]{"command": "get_booking_details", "params": {"reference": "HTNABL"}}[/json]
Backend: (To AI) Found booking: Andrew Wallace, 3 people at 8:15 PM on May 22, 2023

The following Python commands are available to you:

- book_table(date: str, time: str) - books a table. Example JSON: [json]{"command": "book_table", "params": {"date": "next Monday", "time": "8:30 pm"}}[/json]
- cancel_booking(reference: str) - cancels a booking. Example JSON: [json]{"command": "cancel_booking", "params": {"reference": "Z9B23B"}}[/json]

A transcript of your chat session with a customer follows.

AI: Hi there, how can I help?
Customer: Hi, I'd like to book a table for 4 pm tomorrow.
AI: Sure, give me a second. [json]{"command": "book_table", "params": {"date": "tomorrow", "time": "4 pm"}}[/json]
Backend: (To AI) Booking confirmed, reference: Z9B23B
AI: The booking is confirmed, your reference is Z9B23B.
Customer: Sorry, that should have been 5 PM, can you change it?
AI: Sure, I'll adjust the booking. [json]{"command": "cancel_booking", "params": {"reference": "Z9B23B"}}[/json][json]{"command": "book_table", "params": {"date": "tomorrow", "time": "5 pm"}}[/json]

Lacking a command for changing a booking, GPT-3 attempts to combine available commands in order to cancel the booking and then create a new one with the correct details. Let me tell ya, it blew my mind when I first saw it.

Retrieving Answers from a Knowledge Base

Our chatbot is now useful in that it can actually process the customers’ requests. A big problem remains: The bot knows nothing about our restaurant. So if a customer inquires about gluten-free menu options or parking, the bot will happily produce convincing bullshit we’ve already seen earlier.

There are several options for dealing with this. One option is for our prompt to explicitly instruct the bot that it doesn’t know anything about Death Star and refuses to answer customers’ questions about it. That would be pretty gruesome user experience, though.

The second option is to list basic facts and FAQ about the restaurant in the prompt itself and let the chatbot draw from this information. This should work, but we don’t really want to swell the prompt too much because OpenAI bills us by the text token, and there’s a token limit on the API request length, too.

The third option, which is what GRACE goes with, is to let the bot consult an external knowledge base with the help of a special look_up backend command. This new command takes a question and uses simple semantic search to retrieve the most relevant answer from our knowledge base. This answer is then injected into the chat as a backend utterance.

Here’s how it works:

You are an AI assistant for Death Star, a Star Wars-themed restaurant in Cupertino, CA. You chat with customers to help process their requests. For example:

Customer: I'd like to get the details of my booking, reference HTNABL.
AI: Sure, give me a second. [json]{"command": "get_booking_details", "params": {"reference": "HTNABL"}}[/json]
Backend: (To AI) Found booking: Andrew Wallace, 3 people at 8:15 PM on May 22, 2023

The following Python commands are available to you:

- cancel_booking(reference: str) - cancels a booking. Example JSON: [json]{"command": "cancel_booking", "params": {"reference": "Z9B23B"}}[/json]
- look_up(question: str) - look up an answer to a question about Death Star. Example JSON: [json]{"command": "look_up", "params": {"question": "What are your opening hours?"}}[/json]

If you need to obtain some facts about Death Star, you can use the look_up command to retrieve them.

A transcript of your chat session with a customer follows.

AI: Hi there! How can I help you today?
Customer: Hello, do you have on-site parking?
AI: Certainly! Let me look up the parking information for you. [json]{"command": "look_up", "params": {"question": "Do you have on-site parking?"}}[/json]
Backend: (To AI) We don't offer on-site parking, but you can park on the street for free.
AI: Unfortunately, we don't offer on-site parking, but you can park on the street for free. Is there anything else I can help you with?

The backend response above is the verbatim text of the most relevant passage that the backend pulled from our FAQ/knowledge base. Of course, GPT-3 has no problem retranslating it to the customer.

Testing the Conversation Flows

GRACE is a fun bot to chat to, but how can you ensure that it does what the customer asks it to do, in an automated way? GPT-3 responses are not deterministic: you can get a different completion for the same prompt on a different run. Even if they were, though, any changes to the prompt would render a fixed test sequence of human responses useless.

Enter chatbot pitting a.k.a. chatbot mud fight! We can replace the human with another GPT-3 chatbot that would imitate a customer with a particular request. In essence, we’ll be cross-wiring the two chatbots. By checking the state of our backend database after the chat session, we can determine whether GRACE fulfilled the customer’s request or not.

We start the ~~mud fight~~ test session by letting the AI chatbot output its first utterance. Let’s say it’s this one:

Hi there! How can I help you?

Next, we append this utterance to the customer chatbot’s initial prompt (which can be pretty simple) and request a completion:

You are a customer of Death Star, a Star Wars-themed restaurant in Cupertino, CA. You are chatting to the restaurant's AI assistant. You are looking to book a table on the name of Jeremiah Biggs, for 3 people at 8 pm on June 23, 2023. You don't provide all of this information at once but rather respond to the AI assistant's prompts.

A transcript of your chat session with the AI assistant follows.

AI: Hi there! How can I help you?
Customer:

Running this through GPT-3 gives us our customer’s first utterance:

Hi, I'd like to make a reservation for 3 people at Death Star.

We then feed this utterance back to the AI chatbot and repeat the process until we reach the end of the conversation. Then, we check if the backend database actually contains the booking for Jeremiah Biggs.

Limitations and Lessons Learned

So there you have it - GRACE is a chatbot built with GPT-3 that can retrieve information from a knowledge base and process the customers’ requests by communicating with an API backend. You can easily adapt GRACE for your use case by implementing your API in backend.py and adjusting your domain information in domain.yaml. What you don’t have to do is code the intents, train the bot on numerous example stories, or program any of its utterances. The magic is in a cleverly constructed GPT-3 prompt, plus some Python alchemy to wire things up.

GRACE is not perfect, of course. It sometimes forgets to output the command JSON, while happily reporting to the customer that their request has been processed. Or it sometimes invents stuff, like passing “Customer” as the customer’s name to the backend. It can also refuse to answer a question about the business without an apparent reason, among other things.

It’s also a very well-known fact that language models are bad with dates and numbers in general. I’ve given up on requiring the YYYY-MM-DD date format in backend calls because there’s no way in hell GPT-3 would correctly transform a relative date like “next Monday” into an absolute one. Instead, I’m letting GPT-3 pass all dates verbatim, and then parse them with parsedattime on the backend.

You need to take special care when naming the backend functions and parameters, too. It actually does matter to GPT-3 whether a command is called retrieve_booking vs. get_booking_details. Not just because the bot may slip the name of the command to the customer (“Now I’m executing the retrieve_booking command to get your booking”), but also because the latter name seems to make it clearer to GPT-3 what the command does and what it can be used for. Same for the parameters - I’ve found full_name to work better than just name, for example.

A lot of my time developing GRACE was spent on polishing the prompt. Which is no wonder because with GRACE, the prompt is the program, if you can call it that way. My prediction is that as we move forward with large language models, LM Prompt Designer will become a legitimate job title.

Conclusions

At the moment, GRACE is just a proof of concept. Yet, to me, it drives home two powerful points.

The first one is that the days of fine-tuning with large amounts of labeled data in chatbot design are drawing to a close. The future is few-shot, not fine-tuning. It belongs to models like GPT-3, where you formulate the task in natural language, provide a couple of examples and let the model take care of the rest - just like you would do with a reasonably intelligent human.

The second point is that GRACE demonstrates how, through multistep inference, large language models can be made to interact with external systems for knowledge retrieval and performing actions in the real world. We’re used to treating GPT-3/ChatGPT as “things in themselves”, limited just to the knowledge they acquired during training. GRACE shows that the model can be instructed to draw data from elsewhere as it interacts with the user.

This, to me, opens up a whole new area of experimentation: hybrid systems, where language models’ weak points (such as math, or knowledge of the world beyond a certain date) are being helped with external modules running traditional code. Could this be a step towards a symbiosis of deep learning and symbolic AI, which, we’re told, is necessary to really move the field forward?

Art Matsak