A reference architecture for agentic AI in the regulated enterprise

McKinsey published "rethinking enterprise architecture for the agentic era" earlier this year. The strategic direction is right. The article does not, and could not, go to the level of architectural specificity a practising architect needs.

This piece is the working reference architecture I use when actually delivering this work in a regulated enterprise. It is calibrated against my experience as the architecture leader for an organisation operating in multiple regulated jurisdictions, plus a couple of production deployments I have shipped.

The five-layer reference

A working agentic AI architecture in a regulated firm has five layers. Each is necessary; none is sufficient on its own.

Layer 1: Foundation models. The underlying models (GPT-class, Claude-class, Gemini-class, plus domain-specific models). The firm should treat this as a substitutable layer. The architecture function's job is to make it cleanly substitutable.

Layer 2: Model serving. Where the models execute. For regulated firms, this is rarely the model vendor's public API in production; it is more often the model vendor's enterprise tenant, a private deployment, or a sovereign instance. The architecture choice here is material and is covered separately in Data residency for AI workloads.

Layer 3: Agent runtime. The orchestration layer that turns model calls into agent behaviour. Tool calling, memory, multi-step reasoning, observability. The firm should own this layer; outsourcing it to a single vendor creates lock-in.

Layer 4: Tool gateway. The mediated interface between the agents and the firm's existing systems. Policy enforcement, audit logging, rate limiting, authentication. This is where MCP integration lands in practice. See MCP is the most important enterprise standard nobody is implementing.

Layer 5: Domain applications. The agents themselves, calibrated to specific business processes. Each agent has a defined purpose, a defined authority boundary, a defined operating envelope. This is where the value is captured and where the use case diversity lives.

The platform components

Across the five layers, six platform components recur in every deployment.

The agent registry. Single canonical record of every agent in the firm. Purpose, owner, authorisation policy, model and prompt configuration, lifecycle status, deployment environments, accountable senior manager. The registry is the source of truth; nothing operates in production without an entry.

The policy engine. Authorisation decisions are made by the policy engine. Agents request access; the engine allows or denies. The engine reads from the agent registry and from the firm's broader authorisation policy. Auditable independently of the agent.

The audit log. Immutable record of every action taken by every agent. Tamper-evident, queryable, retained for the regulatory retention period. The audit log is the firm's primary evidence in the event of an incident or a regulatory query.

The override interface. Human operators can override agent decisions. The override is logged as a first-class event, attributed to the named operator, and reviewed periodically for systemic patterns.

The monitoring and observability layer. Real-time visibility into agent behaviour. Anomaly detection. Performance monitoring. SLA compliance. Cost attribution.

The incident workflow. When something goes wrong (an unexpected tool call, a quality threshold breach, a customer complaint that traces back to agent behaviour), the incident workflow notifies, captures and resolves.

The non-negotiable design principles

Four principles in the reference architecture I will not compromise on.

1. The agent has no authority that has not been explicitly granted. Default deny. Every action the agent can take is enumerable from the agent registry. If the registry says no, the policy engine says no.

2. The audit log is immutable and is generated by the platform, not the agent. The agent cannot decide what to log. The platform observes and logs.

3. The override path is documented and tested. A human operator can stop an agent within an explicit SLA. The path is exercised in production-like conditions regularly.

4. The accountability is named. Each agent has a named senior manager. SMCR or equivalent already requires this in regulated firms; the architecture reinforces it.

The trade-offs the reference does not resolve

Three trade-offs are firm-specific and the reference intentionally leaves them open.

Build vs buy on the agent runtime. A bespoke runtime gives full control and full operating cost. A vendor runtime gives faster time-to-value and a vendor dependency. The right answer depends on the firm's existing engineering capacity, the strategic importance of the agent capability, and the firm's vendor risk posture.

Centralised vs federated platform ownership. A centralised platform team owns the agent platform and serves the use case teams. A federated model gives each use case team its own platform stack with central standards. The trade-off is consistency vs autonomy. Most regulated firms should start centralised and relax over time.

MCP-native vs custom integration. As covered above and in the MCP piece.

Where this leaves the firm

The reference architecture above is what I have seen work in practice. It is not the only configuration that works; it is the one I have the most confidence in for regulated environments.

For firms starting this work in 2026, my recommendation is to invest in the platform components (Layer 4 and the registry, policy engine, audit log) before the first agent goes into production. Building the platform afterwards is significantly more expensive than building it first.