AI Agents are a broad and often confusing topic. As we continue thinking about monetizing consumer GPT chatbots, let’s focus on AI Agents’ ability to take actions. Many people have seen agents that operate like a human using a computer or a browser. These are a useful demonstration of what will be possible to automate, but they are likely too unpredictable and too costly to scale. The more constraints and context that you provide to a GenAI system, the more predictable the output will be. This fact supports a model consisting of many specialized agents as opposed to a single agent with human-level capabilities.
Despite AI being part of “AI Agents”, AI Agents don’t have to use any sort of AI. An Agent can be built to accomplish goals however the designer chooses. It could be purely through prompting an LLM. It is more likely that an Agent will have well-defined tasks built with traditional programming methods. The Agent might use LLMs to determine which path to take or to work in a natural language interface, but why would use choose a probabilistic method when you want a deterministic one?
In this many-agent model, the GPT chatbot becomes an orchestrator. You can also think of GPT chat as the operating system. A good analogy is to think of the GPT chatbot like iOS or Android on your mobile phone, and agents like the apps that you load onto your phone. So, you could have agents for stores and restaurants that you buy from, just like you might order from their mobile app today. Ordering a meal from a restaurant doesn’t require all of human knowledge; you simply need to select from a limited set of choices and pay. It might be nice to use natural language to order, but it isn’t a requirement.
If this model takes hold, the mobile app store will have been jailbroken. Your GPT chat interface is the only app you need. Mobile device security policy limits or prohibits inter-app communication; this doesn’t need to be the case with agents.
Let me provide a real-world example to take us from the abstract to the concrete. I am driving home from my son’s basketball game with my wife and son in the car. I am navigating with a map application on my phone. We want to make our way home, but we also want to pick up food along the way. Not everyone wants to eat from the same restaurant. Using the voice interface to my GPT chatbot app, I tell it where I want to order from. Each restaurant has its own agent that I have previously connected to my chatbot account. Each app can then walk me through ordering and checkout, leveraging the chatbot voice capabilities. Lastly, I can ask the chatbot to send updated directions with the appropriate waypoints to my phone.
This scenario is helpful to the end-user. It is also great for advertising, because it ties purchasing directly into the same platform as the advertising. There is also a lot of data that enables targeted advertising. Let’s say I’m ordering fried chicken from one restaurant. If there is a competitor in the area, they could offer me a discount to order from them. They could even look at my route home and tell me that not only will I save money, but their location is a more direct route for me. One factor on if I take this offer and buy is how much friction is involved. In the current mobile app environment, I would need to install and configure a new app on my phone. I am probably not going to try to do this while driving and in a hurry. But if the app is instead an agent, then all of the configuration is done in the cloud. If I allow the GPT chatbot to manage a payment instrument, it can take care of all of the configuration for me.
An integrated advertising and purchasing platform with detailed user data is a great way to make money. In the next and final segment, we will tie up some miscellaneous loose ends and wrap up the series.
