Thunk.AI | AI Reliability: Concepts and Principles

AI Reliability: Concepts and Principles

AI Reliability is the central motivation for the design of the Thunk.AI platform.

The behavior of agentic automation should match the intent of the user who designed it to model a business process. This concept is called AI Reliability.

The concept of AI Reliability includes correctness of results, compliance with the desired process flow, and consistency of the process across repeated execution of the automation.

Platform Concepts

A platform that implements AI agentic automation has three conceptual layers that are relevant to AI reliability.

The Application Model

The Environment

The AI agent

The application model is a representation of the user’s automation intent, captured in the form of instructions.
Each time the agentic automation executes, the automation platform creates an environment within which to orchestrate and host AI agents, providing them with an appropriate scoped representation of user instructions, relevant context, and AI tools, hosting tool execution, and implementing other checks and balances.
Each AI agent utilizes an LLM (AI model) as an intelligent interpreter of the instructions provided. It may call an LLM many times (in an “agentic loop”) and decide when to terminate its activity.

Principles of AI Agentic Reliability

AI agents are more reliable if their granularity is small.

Minimal autonomy: the narrower the instructions given to the LLM and the narrower the decisions it is expected to make, the less variability there will be in the decision outcomes.

Minimal context: LLMs interact with the business environment through content and data fetched from other business systems via “tools”. This information acts as the “context” for the LLM’s decisions. The less irrelevant context is provided, the less the opportunity for divergence from the desired intent.

Minimal agency: the narrower the possible set of action responses from an LLM, the less the variability of those action response choices.

The application model needs to encourage or at least enable breaking intent into smaller granularity units that minimize autonomy, context, and agency.

Persisting and reusing decisions increases consistency.

Design-time Planning: if the intent expressed by the user is translated at design-time into concrete decisions (eg: a deterministic sequence of steps to follow), it improves the consistency of AI automation.

Run-time Planning: If the agentic execution environment records dynamic AI agent decisions and reuses them where appropriate, it improves the consistency of AI automation.

Both the design and execution environment should emphasize planning and the reuse of plans where appropriate.

AI agents stay more focused if they explain and record their progress as they work.

Reflection: If agents require LLMs to explicitly provide explanations, it increases the alignment of the LLM's immediate response with the desired instructions. It also reinforces alignment of subsequent LLM responses.

Checkpointing: If agents are required to record partial progress in persisted workflow state, it improves not just alignment within one agent, but also across multiple agents that may operate in sequence.

The automation environment should require the agents (and the LLMs they use) to use reflection and checkpointing.

Early detection of errors increases reliability.

Design-time Verification: if the intent expressed by the user is checked at design-time for inconsistencies and incompleteness, it reduces the opportunity for “human error” in the specification of the automated process.

Run-time Verification: If the agentic environment checks for errors in all agent responses and tool responses, it provides the opportunity to course-correct and achieve a reliable outcome.

The design and execution environment should prioritize verification of both agent results and tool results, with a feedback loop that leads to the agents (and the LLMs they use) potentially correcting their mistakes.

End-to-end Reliable Automation

These AI reliability principles guide the design of all aspects of the Thunk.AI platform. This results in a best-of-breed platform for AI agentic automation. One concrete demonstration of the platform’s end-to-end reliability is captured by the Hi-Fi Reliability Benchmark.

In practice, the end-to-end reliability of AI agent automation depends on a combination of two factors:

The nature of the workflow process --- how specific the process is and how much "intelligent" decision-making is expected from AI agents to handle variability of inputs and contexts, and the degree of variability of the runtime workflow inputs and data.

The design choices of the AI agentic automation platform:

The clarity and granularity of user intent – which depends on the application model of the platform and how the design environment guides the user to leverage the application model effectively
The control exerted by the execution runtime, shaping the agentic environment and the agent execution to conform to the desired intent.

Learn more about AI reliability in Thunk.AI