Reliable AI agents

The evolution of AI applications from chatbots to agents to reliable agents

Generative AI applications have evolved from simple single-user chatbots to multi-agent workflows that can automate work effectively in an enterprise. Reliability is the remaining crucial challenge to overcome.

We are in the midst of an intense and rapid evolution of generative-AI technology from purely consumer applications towards business applications. The launch of ChatGPT by OpenAI kicked off the first wave of AI apps (chatbots and co-pilots), focused on consumers, and dominated by an interactive chat UI. The second wave that followed a year later (again sparked by OpenAI's "GPTs" platform) started to provide the ability for people to build and share simple AI apps for others.

The third wave of generative-AI applications are "agentic": they add automation and a number of other foundational capabilities essential for applying AI to robust enterprise scenarios. This class of business applications has been dubbed "Service as Software". Whether referred to as "agentic" AI applications or "service as software" enterprise applications, these AI-native applications represent a new era of software technology with generative-AI models at their core. However, most enterprises are unable to actually deploy agentic applications because of a lack of reliability and consistency. The emerging fourth wave of Reliable AI agents will unlock the full potential of AI automation.

The evolution of AI-native applications

Since the initial launch of ChatGPT, there have been three waves of AI-native applications.

Wave 1: Generic Chatbot & Co-Pilot

The first version of ChatGPT and other "intelligent chatbots" belong to this earliest wave. The application is meant to support a single user who is both the source of instruction (the "prompt engineer") and when needed, acts as the human in-the-loop to vet and correct the work of the LLM. The user experience is a direct conversation with the LLM model. Many vertical SaaS products and productivity suites also added this class of chat-based "copilot" as a means to provide their end-users with interactive AI engagement.

This kind of simple single-user interactive application is inadequate for most enterprise scenarios. There is no designer providing instructions or shaping the application to a particular purpose. And there is no automation.

Wave 2: Chatbot + design

The second wave of "assistants", led by OpenAI's "GPTs", began to specialize AI applications to particular problems. They used the beginnings of an application architecture. A typical wave-2 assistant has a human designer or owner who creates it in a design phase (by providing some prompts, some documents for context, and some custom tools to integrate with other systems). The behavior of the AI application has now been customized to serve a particular purpose. This is effectively "programming" by describing what the application should do without having to represent that logic in complex software code.

These second-wave AI applications still have a simple conversational user model but the application itself has a no-code "programming model". The expressive power of the applications is limited and there are few mechanisms for steering and control of the AI behavior. There is no capacity for automated work. If the end-user doesn't interactively drive work and check the outcomes, no work happens.

Wave 3: AI agents that automate

The third wave of AI applications have focused on automating work using AI "agents". These agents do work on behalf of the user or team or organization. The work may be long-running. The work may be triggered by events in the environment (incoming messages or data changes).

The primary motivation for automated applications comes from the desire for business productivity via automation. AI agent automation is fundamentally different from traditional automation or workflow applications.

Application model: each AI agent platform provides a particular application model with a certain level of expressive power for the application. Many platforms have some combination of a workflow model, a document collection model, an integration model, etc.
Design language: The logic of the application has to be defined by a human designer. Depending on the platform, this may be via code, via natural language, or via some no-code or low-code design paradigm. For example, in Thunk.AI, the design language is entirely "no-code" via natural language.
Automation runtime: Since automation is the main goal of these applications, the platform has to support automated units of work that interpret and execute the application logic repeatedly. These are "AI agents". Human agents may be "in-the-loop" to approve or augment the AI-driven work, when needed.

Yet, there is one big problem that has prevented the deployment of Wave 3 AI agent applications in enterprise settings. Reliability!

Wave 4: AI agents + Reliability

A traditional software program is completely deterministic. All scenarios have to be accounted for in the code, and the behavior of the code is completely predictable. Of course, the challenge is that this is extremely rigid and only applies to situations that perfectly fit the program's assumptions.

AI agent applications are exciting because they can do better than such deterministic programs. We want AI agents to do sensible "intelligent" things to handle unexpected anomalies and variations in the inputs and environment. After all, that is what we expect of intelligent human work and so we expect the same of intelligent AI agents.

And indeed, AI agent applications can often deliver on this promise using autonomy to make dynamic decisions and with the agency to act upon them. The underlying AI models have the ability to generalize from their massive training set of language and world knowledge, interpret the instructions of the designer, and apply them sensibly to the context at hand. Yet, most Wave 3 agent applications also do worse in one significant dimension, and that is also a fundamental consequence of the nature of AI models. The models are probabilistic. As a result, they make mistakes. And this makes the agents difficult to trust in a business environment that values reliable and repeatable work.

Automation of business work requires 99%+ reliability --- correctness, repeatability, predictability. This brings us to Wave 4: AI agents + reliability. In this new modern wave focused on reliability, Thunk.AI is the leading AI agent platform.

‹ The Thunk.AI Reliability and Consistency Benchmark

Ensuring Reliability and Consistency with AI Agents ›