Contingency plans are nothing new for business operations. Workers get sick, take vacation, don’t perform as expected, etc. The business needs some slack capacity and/or a prioritization system when there isn’t enough capacity to meet demands. Process automation also isn’t particularly new. Again, you can build in slack capacity and/or task prioritization to manage outages or issues in your automation solutions.
So, why should AI Agents be any different? They are just another way to automate tasks, right? The difference is that AI Agents add the additional complexity of probability, scope of actions, and changing behavior.
🎲 LLM-powered agents are probabilistic, meaning that you will get a distribution of different outcomes from the same input. The output won’t be the same every time. With traditional automation, you can in theory, have complete testing coverage of all possible scenarios. With AI Agents, you cannot.
🌍LLMs are built on a massive amount of data. You likely don’t want your AI Agent to make use of every part of that dataset. That means putting in guardrails to constrain the possible outputs. Human workers share this broad scope but are generally easier to keep from going off the rails.
💡Many agents use reinforcement learning to get better over time. This means you constantly have to reevaluate how the system is responding.
AI Agents are homogeneous replicas like process automation, but have the broad task scope of human workers. Combine this with machine learning and probabilistic output, and it is a bit like a dev team releasing updates into production at random with no notification. If something goes wrong, it’s going wrong everywhere, all at once.
So, what do we do? How do we leverage the advantages of AI Agents while minimizing risk?
1️⃣ Use the simplest form of automation that you can. Rules-based automation is powerful, easier to maintain, and more predictable.
2️⃣ Keep some people in the process. Subject matter experts can monitor and guide automation while also providing some fail-over capacity in emergencies.
3️⃣ Monitor your business with real-time operations, alerting, and anomaly detection.
4️⃣ Enable real-time tuning of AI Agents
5️⃣ Consider a heterogeneous AI environment with different models, system prompts, etc running in parallel. Distribute the workload across these agents. Monitor and adjust based on your business needs. You can pull out a problematic agent without taking down your whole infrastructure.
Agentic workflows are powerful but will require an evolution of operational support teams and systems to deliver on their promise.
