Tarun Bulchandani

Top trends in enterprise architecture 2026

2026-11-05T00:00:00.000Z

Capgemini publishes top-trends pieces for banking, insurance and financial services each year. BCG runs the AI Radar. PwC publishes its UK economic predictions. McKinsey runs its State of AI series. Nobody publishes a working architect's top-trends piece for the enterprise architecture practice itself.

This is the first annual version of one. It is written from the practitioner side and is calibrated against what I see in my own work and in conversations with peers in the function.

Trend 1: the EA function carries more direct

delivery weight

In 2024, the EA function was largely a reviewing function: standards, frameworks, governance reviews, target-state architecture. By the end of 2026, the firms taking AI seriously have moved the function into direct delivery on platform components (agent platform, identity platform, data platform).

The shift is real and not yet reflected in most EA function staffing. The function that delivered well in 2024 is under-resourced for the 2026 demand.

Trend 2: TOGAF and the equivalent frameworks need

adaptation, not replacement

TOGAF and the other established EA frameworks were designed for an environment with slower change, more deterministic workloads and clearer ownership boundaries. The agentic era stretches all three. The frameworks still work but they need adaptation: faster iteration of the architecture position, explicit treatment of stochastic workloads, clearer rules for agent-driven integration.

The firms claiming TOGAF is dead are overstating the case. The firms that ignore the adaptation question are under-stating it.

Trend 3: the commercial EA tool market is

restructuring

I wrote about this in the EA tool market has 18 months and the intervening twelve months have, if anything, accelerated the shift. The major commercial EA tools are absorbed into broader platforms or are losing relevance to AI-augmented internal alternatives. The EA function has to make a deliberate choice rather than inherit one.

Trend 4: architecture decision records become a

living practice again

ADRs were a 2010s discipline that decayed in many firms. The agentic era has reinvigorated the practice for a specific reason: AI agents can read ADRs and use them to inform code generation, refactoring recommendations, and integration design. ADRs that were a documentation chore are becoming an operational artefact.

See The evolving role of architecture decision records in the age of generative AI.

Trend 5: fitness functions become measurable

Architectural fitness functions have been a concept since the early 2010s but were rarely measured in practice. The observability investment around agent platforms has, as a side effect, made many of the fitness function metrics genuinely measurable. The EA function can now operate against measured fitness functions rather than asserted ones. See Architectural fitness functions: a practical framework.

Trend 6: enterprise architecture and the regulatory

function move closer together

The convergence is structural. The regulatory function has more direct dependency on architectural choices (AI use cases, data residency, agent governance, model inventory). The architecture function has more direct exposure to regulatory enforcement (SS1/23, EU AI Act, operational resilience). The two functions have to work as one team, not as two adjacent functions.

Firms that have not made this organisational shift will do so in 2026 or 2027.

Trend 7: MCP and equivalent standards become the

operating norm

Twelve months ago, MCP was a curiosity. Twelve months from now, it will be assumed. The firms that have not adopted it will be the exception, and the cost of being the exception will be measurable. See MCP is the most important enterprise standard nobody is implementing for the context.

Trend 8: the platform-team-vs-product-team boundary

gets redrawn

The DevOps consolidation of the late 2010s blurred the boundary between platform teams and product teams. The agentic era has prompted a clearer redraw: platform teams own the foundational components (model serving, agent platform, observability, identity); product teams own the use cases that consume them. The architecture function has to be explicit about which is which.

Trend 9: the architecture function develops a

buy-side discipline

The vendor selection decisions in 2026 are larger and more consequential than in any previous EA cycle. Foundation model vendors, agent platform vendors, SaaS vendors with embedded AI. The architecture function has to develop a buy-side discipline that operates at the level the decisions require, with proper criteria, proper diligence and proper negotiation support. Most firms have not invested in this capability.

Trend 10: the architect-as-builder model gains

traction

A small but growing number of architecture leaders are shipping code, not just specifications. The Meridian and CANVAS systems I built at Sonnedix sit in this category. The pattern is not appropriate for every firm or every architect, but where it works it delivers materially faster than the specification-driven model. See the Meridian case study and the CANVAS case study.

Where this leaves the function

The EA function in 2026 is materially different from the function in 2022. More delivery weight, more regulatory exposure, more direct ownership of platform components, more accountability for vendor decisions.

The firms whose EA functions adapt to this carry the agentic transition well. The firms whose EA functions remain in the reviewing posture will struggle.

This piece will be revisited annually. The 2027 version will mark which of these trends accelerated, which plateaued, and which were overstated.

The CIO's AI agenda for 2026: an architect's read

2026-10-29T00:00:00.000Z

Capgemini publishes "when IT meets AI: the CIO perspective" pieces; Bain runs its CIO conversation series; McKinsey publishes its CIO-track work. Each piece is calibrated to the executive audience. The architecture function's read of the same agenda is specific: which workstreams the CIO has to fund, who has to own them, and what the architecture function has to deliver.

This piece is that read, organised around the seven workstreams I think the CIO actually has to land in 2026.

Workstream 1: the agentic AI platform

By far the largest single line in the IT budget for 2026 in firms taking AI seriously. The platform components (registry, tool gateway, observability, policy engine) covered in the platform strategy piece and the reference architecture piece.

Ownership: architecture function, with engineering delivery.

What goes wrong: the platform is under-funded relative to the use cases it has to support. The use case teams build their own and the firm ends up with multiple incompatible stacks.

Workstream 2: the AI governance regime

Model risk management, agent risk management, AI incident response, model and agent registries, audit support, regulatory engagement.

Ownership: shared between the architecture function, the model risk function and the compliance function. The CIO's job is to fund the shared infrastructure.

What goes wrong: each function builds its own AI governance. The firm has three governance regimes that do not reconcile and three teams duplicating effort.

Workstream 3: the legacy modernisation portfolio

The pre-existing modernisation programme, which has not gone away. ERP transformations, core banking modernisations, legacy mainframe migration, end-of-life software replacement.

The agentic shift has changed the integration demands and the data demands of these programmes. The modernisation portfolio has to be re-baselined.

Ownership: each programme's own leadership, with architecture function oversight.

What goes wrong: the modernisation programmes proceed on their original assumptions and have to be re-cut mid-flight. The cost of mid-flight re-cuts is materially higher than the cost of upfront re-baselining.

Workstream 4: the cyber and identity uplift

The agent population is, structurally, a population of non-human identities. The legacy identity treatment does not work for them. The cyber control surface for agent-driven workflows is different from the human- driven equivalent.

Ownership: CISO and architecture function jointly.

What goes wrong: the cyber uplift is treated as incremental to the existing cyber programme. The specific agent-driven threats are not designed for. See Non-human identity in the age of AI agents and Cyber guardrails for AI agents in regulated workflows.

Workstream 5: the data platform consolidation

Integrated reporting, ESG data, operational analytics, the agentic AI workloads all require a coherent data platform. Most firms are running multiple incompatible data platforms accumulated over the last decade.

Ownership: architecture function and the data function, where the data function exists.

What goes wrong: the consolidation is deferred because the immediate cost is visible and the benefit accrues gradually. The firm ends up paying both the legacy cost and the new platform cost concurrently for years. See Integrated reporting and the enterprise architecture function.

Workstream 6: the vendor and outsource discipline

Foundation model vendors, AI tool vendors, SaaS vendors with embedded AI features. Each is, in regulated firms, an outsource arrangement. The vendor selection has to apply the outsource discipline rather than the technology procurement discipline.

Ownership: architecture function, supplier management function, regulatory function.

What goes wrong: the vendor selection happens in procurement on standard procurement terms. The outsource discipline is applied retroactively, often after the contract is signed. The renegotiation is expensive and sometimes infeasible.

Workstream 7: the cost discipline

Agent workloads can become expensive quickly. The foundation model costs, the inference compute costs, the observability storage costs, the audit retention costs all compound. Without cost attribution and cost control, the firm finds out about the cost surprise after the fact.

Ownership: CIO, FinOps function, architecture function.

What goes wrong: cost attribution is built after the cost surprise rather than before it. The use case teams have no incentive to manage their consumption.

The funding shape

A 2026 IT budget that does these seven workstreams well looks materially different from a 2024 budget. Three shifts.

More platform investment, less use-case investment. The platform components above pay back across many use cases. Funding them properly is more efficient than funding each use case to build its own.

More governance investment. The model risk function, the regulatory function and the architecture function all need more capacity than the 2024 baseline.

More observability and cost discipline. The operational characteristics of the agent estate require investment in the run-time discipline, not just the build-time delivery.

Where this leaves the CIO

The CIO that funds these seven workstreams well in 2026 puts the firm in a position to capture the agentic AI value the consultancy commentary points at. The CIO that funds the use cases without funding the underlying workstreams will spend 2027 and 2028 rebuilding.

The architecture function's job is to make the case for the underlying workstreams clearly and defensibly. The CIO's job is to back the case.

Integrated reporting and the enterprise architecture function

2026-10-22T00:00:00.000Z

EY published "how integrated reporting can give you the whole story" earlier this year. The piece argues that the conventional separation between financial reporting, ESG reporting, operational reporting and strategic narrative no longer serves the audiences that consume them. The integrated reporting concept (financial, non-financial, narrative woven together) is the proposed response.

The piece is written for the CFO and the audit committee. The architecture function carries more of the delivery than that framing implies, particularly as ESG and operational data move from manual spreadsheet collection to machine-readable, auditable, real-time feeds.

This piece sets out what the architecture function has to deliver to make integrated reporting genuinely useful.

What integrated reporting actually requires

Three categories of data have to flow into the same canonical reporting layer.

1. Financial data. The general ledger, the sub-ledgers, the consolidation engine. This is well- established in the existing finance estate.

2. ESG and sustainability data. Emissions data (Scope 1, 2 and 3), energy consumption, water, diversity metrics, governance metrics, supply chain visibility. In most firms, this data is collected through annual surveys, manual spreadsheets and opportunistic systems. The CSRD, ISSB and SEC climate disclosure frameworks have made the requirement firmer; the underlying data discipline is often still weak.

3. Operational and strategic data. Customer metrics, employee metrics, operational KPIs, strategic programme status. Scattered across CRM, HR systems, operational dashboards and the strategy team's spreadsheets.

The integrated reporting frame requires these to reconcile against each other, to be retrievable on the same cadence, and to support the same audit standards.

The architecture problems

Four problems show up in every implementation.

The data dictionary is inconsistent. The same concept is named differently in different systems. "Active customer" in CRM does not match "billed customer" in the billing engine which does not match "recognised revenue customer" in the general ledger. Without a defined data dictionary, the reports do not reconcile.

The cadence does not align. Financial close is monthly; ESG data collection is annual; operational data is daily. Integrated reporting requires a common cadence at least for the periods being reported, which forces investment in the slower data streams.

The audit standards differ. Financial data has been audited to a clear standard for decades. ESG data is moving toward audit, but the standards are still evolving. Operational data is rarely audited. The integrated report has to handle the variation explicitly rather than gloss it.

The system boundaries do not match the reporting boundaries. The financial entity structure, the operational footprint, the legal entity structure and the ESG reporting boundary are all different. The data layer has to support translation between them.

What the architecture function has to deliver

Five components.

A common data dictionary. The architecture function defines, and the data governance function maintains, a common dictionary of concepts used in reporting. Includes financial concepts (revenue, EBITDA), operational concepts (active customers, churn), and ESG concepts (emissions intensity, water withdrawal). The dictionary is authoritative; the source systems reconcile to it, not the other way around.

An integrated data fabric. A data layer that exposes the dictionary concepts across the source systems. The implementation varies (warehouse, lakehouse, mesh) but the requirement is the same: auditable, queryable, versioned, governed.

A reporting cadence calendar. The defined cadence for each concept (daily, monthly, quarterly, annually) and the dependencies between them (operational data feeds quarterly reporting, financial close feeds annual reporting).

An audit trail. Every figure in the integrated report has to be traceable to its underlying data with the relevant audit evidence. The architecture function has to build this traceability into the data fabric; retrofitting it during the audit cycle is materially more expensive.

A change governance layer. Reporting concepts will change (new ESG standards, new operational metrics, new strategic programmes). The architecture function has to manage these changes without breaking year-over-year comparability or audit trail continuity.

Where firms underspend

Two areas.

The ESG data fabric. Most firms are still treating ESG data as a separate workstream with its own systems and its own annual cadence. The integrated reporting frame requires it to land in the same data fabric as the financial data, with the same governance. Few firms have made this investment.

The traceability layer. Most firms can produce the integrated report; few can defend the specific numbers in it to the granularity an integrated audit will require. The architecture function should be investing in traceability before the audit pressure arrives.

Where this leaves the firm

Integrated reporting is, at the executive layer, a narrative project. At the architecture layer, it is a data platform project with specific governance requirements. The firms that get it right invest in the data platform first and the narrative second.

For firms doing this work in 2026, my recommendation is to start with the common data dictionary, then invest in the ESG data fabric, then build the traceability layer. The integrated report itself follows from those three.

The intelligent superhighway, translated: what AI-ready cloud foundations actually mean

2026-10-15T00:00:00.000Z

Accenture's "intelligent superhighway" framing and the companion "AI innovation is nonstop, your cloud foundation should be too" piece are the cleanest articulation of the AI-ready cloud foundation narrative in the public commentary. The pieces make the strategic case well: the enterprise cloud strategies of the last decade were written for a workload mix that no longer reflects what the firm actually runs.

The strategic case is correct. The architecture function still has to translate "intelligent superhighway" into specific design decisions. This piece does the translation.

What the marketing language actually means

Five elements turn up in every working AI-ready cloud foundation. The marketing language sometimes obscures what each one is for.

1. A unified data layer. The classical enterprise estate has data scattered across operational systems, analytical warehouses, data lakes and the various SaaS systems the firm has accumulated. The "unified data layer" is the architecture function's commitment to a single canonical view of the firm's data, accessible by agents at low latency and respecting the firm's data governance.

In practice: a properly designed data mesh or data fabric with explicit ownership, explicit quality contracts and explicit access controls. The technology choice (data mesh vs lakehouse vs warehouse) matters less than the governance discipline.

2. A low-latency model serving layer. Foundation models and bespoke models served close to the operational data, with predictable latency and predictable cost. For most regulated firms, this is the vendor's enterprise tenant in the right jurisdiction rather than the vendor's public API.

In practice: model deployment in the firm's cloud tenancy, with the same operational discipline as any other production workload (capacity planning, SLO monitoring, incident response).

3. A scalable tool gateway. Agents calling tools at scale require the tool gateway to handle the volume. Most existing enterprise integration platforms are not designed for this; they were built for transactional volumes, not for agent-driven volumes that can spike non-linearly.

In practice: a purpose-built tool gateway with strong rate limiting, queueing, observability and circuit breaking. The architecture function should expect to build this rather than buy it; the commercial options are still maturing.

4. A robust observability layer. The agent reasoning traces, tool calls, output samples and override events have to be captured, indexed and queryable. The volume is materially higher than classical application observability.

In practice: an observability stack tuned for the agent workload. Most firms underestimate the storage and indexing cost of this by a factor of three to five.

5. A governance fabric. Identity, authorisation, audit, change control, model and agent registries. Wired through the rest of the platform so the agent operating envelope is enforced consistently.

In practice: the agent platform components I have covered in the platform strategy piece and the reference architecture piece.

What "always-on innovation" actually requires

The "your cloud foundation should be too" framing points at something specific: the cloud foundation has to support continuous deployment of new models, new prompts, new agents, new tool integrations without destabilising production.

Three operational characteristics turn out to be necessary.

Continuous deployment with rollback. New agent versions deploy through a defined pipeline. Rollback is single-command. The audit trail of which version was running when is preserved.

Shadow deployment. New models, new prompts and new agent versions run alongside the production version on sampled traffic. Performance is compared before promotion. Most firms have not built this for the agent workload.

Canary and blast-radius control. New agent versions reach a small fraction of the production volume first. The blast radius of a regression is bounded. Promotion to full volume is a deliberate decision.

These are not new ideas; they are well-established deployment patterns from the conventional application estate. They have to be specifically applied to the agent estate and have to be funded.

Where firms underspend

Three areas turn up reliably as underspent.

Observability. As above. The volume is a surprise the first time it lands.

Cost attribution. Agent workloads can become expensive quickly. Without per-agent, per-use-case cost attribution, the firm cannot make informed decisions about where to invest and where to retire. Most firms build cost attribution after the cost surprise rather than before it.

Pipeline tooling. The continuous deployment, shadow deployment and canary patterns above require pipeline tooling that most existing enterprise CI/CD systems do not provide out of the box. The investment is real and is rarely budgeted in the original AI-platform business case.

Where this leaves the firm

The AI-ready cloud foundation is, on the whole, a recognisable architecture pattern with new specifics. The architecture function's job is to be clear about the specifics, to fund the components that matter, and to resist the marketing-language abstraction when delivering the work.

For firms doing this work in 2026, my recommendation is to invest in observability and cost attribution before scaling the agent footprint, to build the deployment pipeline tooling as a platform investment rather than a per-use-case investment, and to keep the model serving layer as substitutable as the architecture can support.

Platform strategy for agentic AI: a working reference architecture

2026-10-08T00:00:00.000Z

Accenture published "rewriting platform strategy for agentic AI" earlier this year. The article makes the right strategic case: the existing enterprise platform strategy was written for a different kind of workload, and the agentic shift requires a substantive rewrite, not a marginal update.

The article is, necessarily, written at the strategic narrative layer. The architecture function that has to deliver the rewrite needs a different kind of document. This piece is that document.

What the existing platform strategy assumed

Five assumptions ran through most enterprise platform strategies of the last decade.

1. Workloads are deterministic. The same input produces the same output. Where it does not, the variance is a bug to be fixed.

2. Authentication is for users. The principal in the authorisation flow is a human. Service-to-service authentication is a special case.

3. Tools are called by code, in fixed sequences. The integration patterns are choreographed at design time. Runtime composition is rare.

4. Audit trails are about who saw what. Read access, write access, configuration change. The trail captures human-readable causation.

5. Capacity planning is about peak load. The peak is forecastable from historical patterns and grown linearly.

The agentic shift breaks all five.

What the new platform strategy has to assume

The replacement assumptions.

1. Workloads are stochastic. The same input does not produce the same output. The variance is a feature. The architecture has to support reasoning about behaviour across the distribution, not just at the modal output.

2. Authentication is for principals of multiple kinds. Humans, services, agents, customer agents, delegated principals. The authorisation flow has to support all of them, with different control surfaces for each.

3. Tools are called by agents, in sequences determined at runtime. Integration patterns are composed dynamically. The architecture has to support this without losing the safety properties.

4. Audit trails are about what the agent saw, what it considered, what it decided and why. Reasoning traces become first-class data. Storage and retrievability change.

5. Capacity planning is about reasoning depth and breadth. A single agent invocation can multiply into many tool calls and many sub-agent invocations. The capacity model has to account for this non-linearly.

The five-component platform

The platform that supports these new assumptions has five components, each with a clear role and a clear ownership boundary.

Component 1: The agent runtime. Where agents execute. Loads the model, manages the reasoning loop, invokes tools, returns outputs. This is the layer with the most third-party options; the architecture function should treat it as substitutable.

Component 2: The tool gateway. Mediates every tool call from every agent. Enforces authorisation, logs the call, applies rate limits, handles failures. This is the layer with the most leverage; the architecture function should own it.

Component 3: The agent registry. Source of truth for every agent in the firm. Configuration, purpose, authorisation policy, ownership, lifecycle status. See Model and agent registries.

Component 4: The observability layer. Captures every reasoning trace, every tool call, every output. Indexed for query, retained for the regulatory window. This is the largest data layer the platform produces; the architecture function should design for the volume explicitly.

Component 5: The policy and override surface. Where humans intervene. Policy authors define what agents can do; operators override what specific agents do at runtime; auditors review what happened after the fact. Three distinct user roles, one underlying control plane.

The ownership boundaries

The platform strategy has to be explicit about who owns which component.

Architecture function: Tool gateway, agent registry, observability layer, policy and override surface. These are the platform components where firm-wide consistency matters and where the architecture function carries the authority.

AI function or use case teams: Agent runtime and the specific agents on top. These are where the use case diversity lives and where the architecture function should set standards rather than centralise delivery.

Operations function: Day-to-day operation of the platform. SLA management, incident response, capacity planning.

Compliance and risk: Policy authorship, audit review, model and agent inventory governance.

The boundaries are not always cleanly observed in practice. The architecture function's job is to clarify them and to defend them.

What goes wrong

Three failure patterns recur in firms attempting this rewrite.

The strategy is written but not staffed. The architecture function publishes the platform strategy, the leadership team endorses it, and then no team is funded to build the platform components. The use case teams build bespoke equivalents inside their own deployments. Within a year, the firm has multiple incompatible agent stacks.

The platform is built but the use cases are not governed onto it. The platform exists; the use case teams deploy outside it because the governance does not force them onto it. The platform becomes a sub-scale exhibit rather than the operating backbone.

The platform is built but the operating model is not. The platform runs; the day-to-day operations are under-resourced; incidents accumulate; trust in the platform erodes. The use case teams start building their own again.

How to deliver the rewrite

Three sequencing decisions matter.

Build the tool gateway and observability layer first. These are the highest-leverage components and the ones most expensive to retrofit. A single use case team can be the first internal customer.

Make the agent registry the source of truth before scaling. A platform that scales without the registry becomes unmanageable inside twelve months. Build the registry alongside the first production deployment.

Govern the use cases onto the platform from day one. The first three or four agent deployments establish the operating norm. If they happen outside the platform, the platform is a long-term fiction.

Where this leaves the firm

Platform strategy for agentic AI is the highest-leverage architectural decision the firm makes in 2026. The firms that get this right will deliver multiple agent capabilities cheaply and safely over the next three years. The firms that get this wrong will spend the same period rebuilding.

Data residency for AI workloads: a working pattern for UK and EU enterprises

2026-10-01T00:00:00.000Z

BCG published "for most countries, AI sovereignty is an illusion. Resilience is real" earlier this year. The piece is a useful reframing of the public debate: pure sovereignty over AI infrastructure is, for most countries, not achievable at any reasonable cost, but operational resilience under foreign-vendor dependency is achievable and is the real engineering question.

The reframing is correct at the level it operates. The architecture function still has to translate it into specific design choices for specific workloads. This piece sets out the working pattern I use for UK and EU enterprise AI workloads.

(For the political framing of the broader debate, see my earlier piece on why sovereign AI is mostly theatre.)

The data flows that matter

Five data flows have to be modelled for any AI workload in a UK or EU enterprise.

1. Training data. Data used to train or fine-tune the model. In most regulated enterprises, this is the firm's own customer or operational data, and the residency requirements are explicit (GDPR, the FCA's operational resilience rules, the EBA's outsourcing guidance).

2. Inference input. Data sent to the model at inference time. Customer queries, transaction details, document content. The residency requirements that apply to the training data typically apply here too; the architecture sometimes forgets this.

3. Inference output. Data returned from the model. For most use cases this is derivative of the input and inherits the residency requirements, but for some use cases (synthesised content, summarisation that includes new material) the output deserves its own residency analysis.

4. Prompt and instruction data. The system prompts, the guardrail prompts, the example libraries. These are the firm's intellectual property and the firm's risk surface. They deserve their own residency treatment.

5. Audit and observability data. Logs of model calls, agent decisions, tool invocations, override events. The residency for these is sometimes treated as a technical concern; in regulated firms it is a compliance concern.

The residency decision matrix

For each of the five data flows, the firm has four deployment options.

Option A: Public API of the vendor's home region. The model runs on the vendor's infrastructure, in the vendor's home country. Lowest cost, lowest control, most permissive vendor terms.

Option B: Public API in a vendor-managed regional deployment. The vendor runs the model in an EU or UK region. Costs more, gives some residency control, typically reasonable vendor terms.

Option C: Private deployment in vendor-managed infrastructure within the firm's residency requirement. The vendor provisions dedicated capacity in the right jurisdiction. Higher cost, materially better control, custom contractual terms.

Option D: Firm-managed deployment. The firm runs the model itself, on its own infrastructure or on a hyperscaler tenancy under its control. Highest cost, highest control, full responsibility for the operational characteristics.

The matrix is not "pick one"; it is "pick one per data flow per workload". A typical regulated workload might land at Option B for inference input, Option C for training data and prompt configuration, and Option D for audit data.

The five design rules

Five rules I apply to every data residency design.

1. Inference data follows the regulatory framework of the customer, not the firm. A UK firm serving an Italian customer has to apply EU residency rules to that customer's data flow, regardless of where the firm is headquartered. The architecture has to be able to discriminate.

2. Audit data residency matches the regulatory retention. If the firm has to retain audit data for seven years in a defined jurisdiction, the audit data flow has to land in that jurisdiction. Vendor SLAs that move audit data to other regions for "operational purposes" are not acceptable.

3. The exit path has to be tested. Operational resilience is not theoretical. The firm has to be able to fail over from Option A to Option B, or from Option B to Option C, within a defined RTO. This needs to be exercised in production-like conditions.

4. The prompt configuration is treated as intellectual property. Where the firm has invested in prompt engineering, the prompts have to be protected as firm IP. Vendor terms that grant the vendor rights to use the prompts have to be negotiated out.

5. The model substitution path is explicit. If the chosen model is no longer available (vendor withdrawal, regulatory action, price change beyond acceptable thresholds), the workload has to be substitutable to an alternative. This is a design constraint, not an afterthought.

What the working pattern looks like

For a typical regulated UK or EU enterprise in 2026, the working pattern I see deliver is:

Inference input and output: Vendor-managed regional deployment (Option B) for the bulk of workloads, with private deployment (Option C) for the most sensitive use cases.
Training and fine-tuning data: Private deployment (Option C) or firm-managed (Option D), depending on the sensitivity and the volume.
Prompt configuration: Private deployment or firm-managed, with explicit IP protections.
Audit and observability data: Firm-managed (Option D), in the jurisdiction of the regulatory retention requirement.

This is more expensive than the path-of-least-resistance configuration (everything on the public API), and materially less expensive than the maximalist configuration (everything firm-managed). The trade-off has to be deliberate.

Where this leaves the firm

Data residency for AI workloads is an architectural decision, not a policy decision. The architecture function has to model the data flows, choose the deployment options per flow, and design the operational resilience.

For firms doing this work in 2026, my recommendation is to model the data flows explicitly before any vendor selection, to negotiate the vendor terms against the flow analysis, and to budget for the higher-control options for the data flows that actually need them.

Model and agent registries: the missing governance artefact

2026-09-24T00:00:00.000Z

SS1/23 is explicit: a regulated UK financial services firm must maintain a model inventory covering its material model use. The EBA guidance and the Federal Reserve's SR 11-7 say substantively the same thing in the EU and US frameworks. The model risk management function in every regulated firm now has a regulatory obligation to keep this artefact current.

In most firms, the artefact exists in some form. In few firms is it built to a standard that genuinely supports the governance the regulators expect. And in almost no firms is there an equivalent registry for agents.

This piece sets out what both registers should contain and why neither is optional in 2026.

The model registry

The model registry is the catalogue of every material model in production use. The contents the regulator expects, in my reading of the supervisory statements and the equivalent guidance:

Identification. Each model has a unique identifier, a version, and a lineage trail showing where it came from. Foundation models are identified by vendor and version; bespoke models are identified by the firm's internal versioning scheme.

Purpose and use case. What is the model used for? What decision does it support? Where in the firm's operating model does it sit?

Risk classification. What is the materiality of the model's use? What are the downside scenarios if it fails? How is the failure caught?

Validation status. Has the model been validated? When? Against what? Who signed off?

Performance monitoring. What metrics are tracked? What thresholds trigger review? What is the cadence?

Accountable senior manager. Named, in the SMCR sense or equivalent. The person who carries the regulatory accountability for this model.

Lifecycle status. In development, in pilot, in production, in deprecation, retired. Each transition has a defined approval workflow.

This list is not exhaustive. The point is that the registry is operational, not documentary. It supports ongoing governance, not just an annual exercise.

What goes wrong

Three patterns recur in firms that have built a model registry but built it poorly.

The registry is a snapshot. It is updated annually as part of the audit cycle. By the time the audit team reads it, the firm has changed model versions, deployed new use cases and retired old ones. The snapshot does not match the reality.

The registry has no policy enforcement. Adding a new model to production does not require updating the registry. The discipline relies on people remembering; in busy delivery cycles, people forget.

The registry has no observability. The registry records the metadata; the live model performance lives in operational systems and is not connected back to the registry. The accountable senior manager has no real-time view of what the registry says they own.

A registry with these characteristics is a compliance artefact, not a governance artefact. The regulators are increasingly attentive to the difference.

The agent registry

An agent registry extends the model registry pattern to cover AI agents. An agent is more than a model: it has a prompt configuration, an authorisation policy, a tool inventory and an operating envelope. Each of these is material to the agent's behaviour and material to the governance.

The agent registry contents I have settled on, beyond the model registry fields:

Authorisation policy. What is the agent allowed to read, call and write? What data sources, what tools, what systems?

Prompt configuration. The system prompt, the guardrail prompts, the example library. Versioned; changes flow through change control.

Tool inventory. The specific tools the agent has access to. Mapped to the underlying systems and the policy engine entries.

Operating envelope. The volume the agent is authorised to handle, the budget cap if applicable, the escalation thresholds.

Override interface. How a human operator overrides the agent. Named operators, audit trail, escalation path.

Incident history. Every incident attributed to the agent. Material for trend analysis and for governance review.

Why the registry is the source of truth, not a copy

The single most important architectural decision is to make the registry the source of truth. Not a copy of data that lives elsewhere; the authoritative record that the rest of the control plane reads from.

When the policy engine makes an authorisation decision, it reads from the agent registry, not from a synchronised copy. When the audit log records an action, it tags the action against the registry entry, not against a denormalised label. When the override interface acts, it acts against the registry entry, not a copy.

This design decision sounds technical but it is the single most consequential governance decision. A registry that is a copy is a record-keeping exercise. A registry that is the source of truth is the operating backbone.

The implementation pattern

Five components.

The registry data store. Persistent, queryable, versioned. Supports atomic updates. Audited.

The registry API. Read and write access for the control plane components, the operations team, and the audit function. Authentication is non-negotiable.

The change control workflow. Updates to the registry flow through a defined approval process. Some changes are routine (incident logging); some require senior management sign-off (authorisation policy changes, new agents into production).

The synchronisation pattern. Where downstream systems need a local cache of registry data (for performance reasons), the synchronisation is one-way from the registry, and the local cache has a defined freshness expectation.

The audit and review function. The registry contents are reviewed periodically: monthly for lifecycle transitions, quarterly for risk-classified changes, annually for the full estate.

Where this leaves the firm

The model registry is a regulatory obligation. The agent registry will be, soon. Building either properly costs less than building it poorly and recovering later. The architecture function is typically the right owner.

For firms that have a thin model registry today, my recommendation is to invest in making the existing registry the source of truth before extending into the agent registry. The two layers have different content but share the same architectural pattern.

Non-human identity in the age of AI agents: an enterprise architecture pattern

2026-09-17T00:00:00.000Z

Non-human identity is the identity assigned to a system, a service, an automated process or, increasingly, an AI agent. The category has existed for decades; the treatment in most enterprises has been informal. Service accounts are created ad-hoc, shared across teams, rotated rarely, and revoked when somebody remembers to do so.

The agentic shift has made this informal treatment untenable. An AI agent is, structurally, a non-human identity. It needs to authenticate, it needs to be authorised against specific resources, it needs to be auditable, and it needs to be revocable. The legacy service-account treatment does not deliver any of these reliably.

This piece sets out the enterprise architecture pattern I use for non-human identity in environments where AI agents are deployed.

The legacy problem

Five legacy patterns recur in most enterprise estates.

Shared credentials. A service account whose password or API key is shared across multiple systems and teams. When the credential is compromised or needs rotation, identifying the dependent systems requires investigation.

Long-lived credentials. API keys that have not been rotated in years. The owner has moved on, the team has restructured, and the keys still grant production access.

Over-privileged credentials. A service account created with broad access at the time of deployment because narrowing the access was operationally expensive. The broad access persists long after the original need.

Undocumented credentials. Service accounts that exist in production but do not appear in any inventory. The accounts are discovered during audit and the ownership is contested.

Credentials without lifecycle. No defined creation process, no defined renewal cycle, no defined revocation process. The credentials persist indefinitely unless somebody actively removes them.

Each of these is a discrete control failure. Together they create the conditions in which an AI agent deployment can quietly accumulate authority well beyond its operating need.

The pattern for AI agents

Six components turn up in every working implementation.

1. The agent is a first-class identity. The agent has its own identity in the identity provider, not a shared service account. The identity is created deliberately, scoped explicitly, and lifecycle-managed.

2. The identity carries metadata. The agent identity record includes: the agent's purpose, its authorisation policy, its accountable owner, its model configuration, its lifecycle status. This metadata is the source of truth referenced by the rest of the control plane.

3. The credentials are short-lived. The agent authenticates using short-lived tokens (typically 15 minutes to a few hours) issued against the agent's identity. Long-lived API keys are rejected as a control pattern.

4. The authorisation is policy-driven. The agent's access to data, tools and systems is enforced by a policy engine that reads from the agent identity record. Changes to authorisation flow through the identity record, not through ad-hoc grant changes on the dependent systems.

5. The lifecycle is automated. Agent identity creation follows a defined approval workflow. Renewal is tied to the agent's lifecycle status. Revocation is automated when the agent reaches end-of-life or is flagged in incident response.

6. The audit is end-to-end. Every authentication, every authorisation decision, every resource access is logged against the agent identity. The audit trail is retrievable for the regulatory retention period.

The implementation pattern

The pattern lands in the architecture in three places.

The identity provider. The firm's existing IDP (Entra ID, Okta, Auth0, etc.) extended to support non-human principals with the metadata above. Most mature IDPs support this; the work is in the configuration and the discipline.

The policy decision point. A policy engine (OPA, Cedar, a commercial equivalent) that consumes the agent identity record and makes authorisation decisions. The engine is auditable independently of the agent runtime.

The agent platform. The agent runtime acquires short-lived tokens via the IDP, presents them at the policy decision point, and operates within the authorised envelope. The agent does not hold long-lived credentials.

The transition pattern

Most firms have legacy non-human identity that does not conform to this pattern. The transition is operationally heavy and usually staged.

Stage 1: inventory. Identify every non-human identity in the estate. Most firms find more than expected.

Stage 2: classification. Classify each identity by risk and by amenability to the new pattern. Some legacy identities will be replaced; some will be retired; some will be wrapped.

Stage 3: new identities use the new pattern. From a defined date, every new non-human identity is created under the new pattern. This includes every new AI agent.

Stage 4: high-risk legacy identities migrate. The identities with the highest access privileges and the largest blast radius are migrated first.

Stage 5: the long tail migrates over an extended timeline. Most firms will have legacy non-human identities in the estate for years. The discipline is to prevent the legacy pattern from being extended into new use cases.

Where this leaves the firm

The non-human identity problem is one of the most under-managed control gaps in most enterprise estates. The agentic shift has elevated it from a hygiene issue to a material control. The architecture function is typically the right owner of the migration to a deliberate pattern.

For firms doing this work in 2026, my recommendation is to lock the new pattern for all new identities first, then prioritise the migration of high-risk legacy identities, and accept that the long tail will resolve over multiple years.

A reference architecture for agentic AI in the regulated enterprise

2026-09-10T00:00:00.000Z

McKinsey published "rethinking enterprise architecture for the agentic era" earlier this year. The strategic direction is right. The article does not, and could not, go to the level of architectural specificity a practising architect needs.

This piece is the working reference architecture I use when actually delivering this work in a regulated enterprise. It is calibrated against my experience as the architecture leader for an organisation operating in multiple regulated jurisdictions, plus a couple of production deployments I have shipped.

The five-layer reference

A working agentic AI architecture in a regulated firm has five layers. Each is necessary; none is sufficient on its own.

Layer 1: Foundation models. The underlying models (GPT-class, Claude-class, Gemini-class, plus domain-specific models). The firm should treat this as a substitutable layer. The architecture function's job is to make it cleanly substitutable.

Layer 2: Model serving. Where the models execute. For regulated firms, this is rarely the model vendor's public API in production; it is more often the model vendor's enterprise tenant, a private deployment, or a sovereign instance. The architecture choice here is material and is covered separately in Data residency for AI workloads.

Layer 3: Agent runtime. The orchestration layer that turns model calls into agent behaviour. Tool calling, memory, multi-step reasoning, observability. The firm should own this layer; outsourcing it to a single vendor creates lock-in.

Layer 4: Tool gateway. The mediated interface between the agents and the firm's existing systems. Policy enforcement, audit logging, rate limiting, authentication. This is where MCP integration lands in practice. See MCP is the most important enterprise standard nobody is implementing.

Layer 5: Domain applications. The agents themselves, calibrated to specific business processes. Each agent has a defined purpose, a defined authority boundary, a defined operating envelope. This is where the value is captured and where the use case diversity lives.

The platform components

Across the five layers, six platform components recur in every deployment.

The agent registry. Single canonical record of every agent in the firm. Purpose, owner, authorisation policy, model and prompt configuration, lifecycle status, deployment environments, accountable senior manager. The registry is the source of truth; nothing operates in production without an entry.

The policy engine. Authorisation decisions are made by the policy engine. Agents request access; the engine allows or denies. The engine reads from the agent registry and from the firm's broader authorisation policy. Auditable independently of the agent.

The audit log. Immutable record of every action taken by every agent. Tamper-evident, queryable, retained for the regulatory retention period. The audit log is the firm's primary evidence in the event of an incident or a regulatory query.

The override interface. Human operators can override agent decisions. The override is logged as a first-class event, attributed to the named operator, and reviewed periodically for systemic patterns.

The monitoring and observability layer. Real-time visibility into agent behaviour. Anomaly detection. Performance monitoring. SLA compliance. Cost attribution.

The incident workflow. When something goes wrong (an unexpected tool call, a quality threshold breach, a customer complaint that traces back to agent behaviour), the incident workflow notifies, captures and resolves.

The non-negotiable design principles

Four principles in the reference architecture I will not compromise on.

1. The agent has no authority that has not been explicitly granted. Default deny. Every action the agent can take is enumerable from the agent registry. If the registry says no, the policy engine says no.

2. The audit log is immutable and is generated by the platform, not the agent. The agent cannot decide what to log. The platform observes and logs.

3. The override path is documented and tested. A human operator can stop an agent within an explicit SLA. The path is exercised in production-like conditions regularly.

4. The accountability is named. Each agent has a named senior manager. SMCR or equivalent already requires this in regulated firms; the architecture reinforces it.

The trade-offs the reference does not resolve

Three trade-offs are firm-specific and the reference intentionally leaves them open.

Build vs buy on the agent runtime. A bespoke runtime gives full control and full operating cost. A vendor runtime gives faster time-to-value and a vendor dependency. The right answer depends on the firm's existing engineering capacity, the strategic importance of the agent capability, and the firm's vendor risk posture.

Centralised vs federated platform ownership. A centralised platform team owns the agent platform and serves the use case teams. A federated model gives each use case team its own platform stack with central standards. The trade-off is consistency vs autonomy. Most regulated firms should start centralised and relax over time.

MCP-native vs custom integration. As covered above and in the MCP piece.

Where this leaves the firm

The reference architecture above is what I have seen work in practice. It is not the only configuration that works; it is the one I have the most confidence in for regulated environments.

For firms starting this work in 2026, my recommendation is to invest in the platform components (Layer 4 and the registry, policy engine, audit log) before the first agent goes into production. Building the platform afterwards is significantly more expensive than building it first.

Banking and financial services architecture: top trends 2026

2026-09-03T00:00:00.000Z

Capgemini publishes its top-trends series for banking, insurance and financial services each year. BCG runs its AI Radar. PwC publishes its UK financial services regulatory commentary. McKinsey publishes its banking agentic AI work. The set of pieces is consistent and broadly agrees on the strategic agenda.

The architecture function's read of the same operating environment is different. The advisor writes for the chief executive, the chief risk officer, the chief financial officer. The architect writes for the people who have to deliver the systems that will make any of this real.

This piece is the architect's view of the 2026 agenda.

Trend 1: regulatory engagement shifts upstream

The FCA, PRA and EBA have moved noticeably faster on AI, operational resilience and outsourcing in the last eighteen months. The architecture function's involvement in regulatory engagement is moving from "after-the-fact review" to "design-phase consultation". Firms that have not made this shift are paying for it through extended implementation timelines.

What to do: embed the regulatory function into the architecture review process, not the other way around.

Trend 2: agentic AI moves from pilot to production

The pilot programmes of 2024 and 2025 are now production in 2026. The production environment surfaces problems the pilot environment did not: scaling cost, audit trail discipline, vendor lock-in, change control. The architecture function carries more of this load than the 2024 sales pitch implied.

What to do: budget for steady-state operating cost of agent infrastructure (registry, policy engine, audit trail, override interface) before scaling beyond the pilot footprint.

Trend 3: the core banking and policy admin platform

modernisation cycle is accelerating

The legacy mainframe estate that survived the last modernisation cycle (2010s) is now under genuine pressure. The agentic shift has changed the integration demands; the regulatory shift has changed the data-residency demands; the cost-base pressure has changed the executive appetite for the modernisation programme.

What to do: separate the modernisation business case from the AI investment business case. Treat them as sequenced rather than combined. The combined business case is too brittle to defend through the inevitable re-baselining cycles.

Trend 4: the architecture function shifts from

"reviewer" to "delivery owner" on agent capabilities

In 2024, the architecture function reviewed the agent deployments after the AI function had built them. In 2026, the architecture function in well-run firms owns the agent platform: registry, policy engine, tool gateway. The AI function owns the use cases on top.

What to do: clarify the ownership boundary explicitly. If the architecture function does not own the platform, the firm will have multiple bespoke implementations within twelve months.

Trend 5: MCP and equivalent standards become

material to vendor selection

Foundation model providers, vertical AI tools and SaaS platforms with embedded AI are being assessed against their interoperability with MCP and equivalent standards. The firms that get this right preserve optionality; the firms that do not are committing to vendor-specific integration that becomes expensive to unwind.

What to do: include interoperability standards compliance in the vendor assessment criteria. See MCP is the most important enterprise standard nobody is implementing.

Trend 6: data residency and sovereignty become

architecture decisions, not just policy decisions

The data residency requirements for financial services have firmed up in both the UK and EU. The architecture function has to design for residency, not just declare it in policy. The implications cascade through model hosting, vector database location, audit trail storage and recovery infrastructure.

What to do: model the residency requirements at the data-flow level, not just at the data-class level.

Trend 7: model risk management becomes a

steady-state discipline

SS1/23 and the equivalent EBA guidance now apply to AI models in production. Model risk management is no longer a project-phase concern; it is a steady-state operating discipline. The architecture function carries significant weight in maintaining the model inventory, the validation evidence and the performance monitoring.

What to do: fund the model risk management function properly. The model inventory needs ongoing engineering support, not just policy support.

Where this leaves the firm

The 2026 agenda is more operationally weighty than the 2024 agenda. The pieces that worked as pilots have to work as production. The pieces that worked at the strategic narrative layer have to work at the systems layer.

The firms that will land 2026 well are the ones whose architecture function has the seniority, the funding and the authority to carry this. The firms where the architecture function is a service provider to other functions will struggle.

How AI is reshaping the compliance function: an architect's view

2026-08-27T00:00:00.000Z

KPMG published "how AI is poised to reshape compliance functions" earlier this year. The piece argues that the compliance function is one of the highest-value candidates for agentic AI augmentation: high volume of structured work, clear rules, auditable outcomes, and acute pressure on cost.

The argument is correct at the level the article operates. The architecture function's read of the same material is more specific: which compliance workflows are genuinely amenable to agent support, which require explicit guardrails, and where the audit-trail design choices land.

The workflows that are obvious candidates

Three workflow shapes show up well.

Document review against a policy. The agent reads a document (a contract, a marketing claim, a customer communication, a transaction record), compares it against a defined policy, and flags compliance issues. The agent's output is a recommendation; the human compliance officer signs off. This shape is broadly mature; multiple production deployments exist.

Filing preparation. The agent assembles a regulatory filing from underlying data sources, formats it according to the regulator's published rules, and prepares it for human review. The human reviews, adjusts where needed, and submits. Most of the heavy lift sits in the data assembly and the format compliance; the agent does well on both.

Customer complaint triage. The agent reads a customer complaint, classifies it against the firm's complaint taxonomy, routes it to the appropriate handler and drafts an initial response. The human handler reviews the draft and sends. This shape is operationally mature in retail financial services.

The workflows that require explicit guardrails

Three workflow shapes are more delicate.

Final-decision workflows. Where the compliance function makes a final decision (a suspicious activity report determination, a sanctions screening match adjudication, a regulatory breach finding), the agent should support but not decide. The architecture has to make this distinction explicit: the human signs the decision, the agent provides the reasoning trail, and the audit log records both.

Investigation workflows. The agent assembles material relevant to a compliance investigation. The risk is that the agent's selection biases the investigation. The mitigation is in the audit log: the investigator can see exactly what the agent considered and what it did not.

Cross-customer pattern detection. Where the agent operates across customer data (transaction monitoring, market abuse detection), the data residency and access controls become more demanding. The architecture has to respect the data segregation the firm has committed to in its regulatory filings.

The workflows the architecture function should resist

Two workflow shapes I would currently keep agents out of in regulated firms.

Sanctions list matching. The downside of a false negative is large; the matching rules are precise; the existing systems already perform well. The marginal value of an agent layer is small and the risk of introducing soft errors is real. Stay with rule-based systems with human review on close matches.

Senior management attestation. Where SMCR or equivalent regulation requires named senior manager attestation, an agent should not be drafting that attestation. The attestation is the named manager's direct statement; tooling can support the data gathering but should not draft the statement itself.

The audit-trail problem

The single largest architecture decision is the audit trail. In a compliance workflow, the trail has to support three audiences over the regulatory retention period:

The internal audit function, reviewing periodically
The external auditor, reviewing annually
The regulator, reviewing on inspection or after an incident

The trail has to capture what the agent saw, what it recommended, what the human reviewed, what the human decided, and any divergence between the recommendation and the decision. The retention period is typically five to seven years and may extend to ten in some regulatory contexts.

Most existing systems do not log at this granularity. The architecture function has to specify the logging contract before deployment, and the operational discipline to maintain it has to be funded.

The operating model implication

A working AI-augmented compliance function has three roles the firm may not currently have.

Agent operator. The human in the loop. Reviews recommendations, decides outcomes, captures rationale. This is an evolved compliance officer role, not a new one.

Agent supervisor. Reviews agent performance, identifies systematic errors, manages the model and prompt configuration. This is closer to a quant role than a compliance role; the firm has to source it carefully.

Accountable senior manager. SMCR or equivalent already requires this. The named senior manager has to have visibility of how the AI-augmented workflows operate and has to be able to defend the design choices in front of the regulator.

The architecture function should be designing the operating model alongside the technical architecture, not afterwards.

Where this leaves the firm

AI in compliance is a real opportunity. The architecture choices determine whether it lands as a productivity gain or as a regulatory exposure. The firms that get this right are the ones where the architecture function treats the compliance use case with the same rigour as any other regulated workflow: explicit guardrails, defensible audit trails, named accountabilities and deliberate operating model design.

Cyber guardrails for AI agents in regulated workflows: a reference architecture

2026-08-20T00:00:00.000Z

EY published "reimagine your cyber guardrails to accelerate AI value" in early 2026. The piece argues that conventional cyber controls were designed for human-driven workflows and need adaptation for the agentic era. The strategic argument is correct. The architecture function still has to translate that into specific controls.

This piece sets out the reference architecture I use for AI agents operating inside regulated workflows. The focus is on the layer where the architecture function actually has design choice: not the model itself, not the business process, but the control surface in between.

The four guardrail categories

Every agent in a regulated workflow needs controls in four categories. The categories are not independent; they reinforce each other and have to be designed together.

1. Identity guardrails. The agent has to have a distinct identity, separate from the human operator who configured it or the customer it acts for. The identity has to be auditable, has to support authorisation policy, and has to support revocation. See Non-human identity in the age of AI agents.

2. Authority guardrails. The agent's authority has to be bounded. It can read from a defined set of data sources. It can call a defined set of APIs. It can write to a defined set of systems. It can spend a defined budget. Each of these has to be explicit in the authorisation policy and enforced at the runtime boundary, not just at the agent configuration layer.

3. Observation guardrails. Every action the agent takes has to be observable. The observation has to be sufficient to reconstruct the agent's reasoning, not just its output. This is where the audit trail design sits.

4. Reversal guardrails. Where the agent's actions can be reversed (financial transactions, customer communications, system changes), the reversal path has to be designed alongside the forward path. Where the actions cannot be reversed (cryptographic operations, external API calls, regulatory submissions), the agent should not be allowed to take them without explicit human-in-the-loop confirmation.

The runtime architecture

Six components turn up in every working implementation.

Agent registry. Each agent has a record. The record includes the agent's purpose, its authorisation policy, its accountable senior manager, its model and prompt configuration, its lifecycle status, and its incident history.

Policy engine. Authorisation decisions are made by the policy engine, not by the agent itself. The agent makes a request; the policy engine returns allow or deny. The policy engine is auditable independently of the agent.

Tool gateway. Agents do not call tools directly. They call the tool gateway, which enforces the policy, logs the call and forwards to the underlying tool if allowed. This is where MCP integrations land in practice.

Audit log. Every action, every decision, every tool call lands in an immutable audit log. The log is queryable, retrievable for the regulatory retention period, and tamper-evident.

Override interface. Human operators can override agent decisions. The override is logged, named, and auditable as a first-class event.

Incident workflow. When an agent does something unexpected (a tool call denied, an unusual reasoning trace, a quality threshold breach), the incident workflow notifies the accountable senior manager and captures the root cause.

The threat model

Three threats matter and have to be designed against.

Prompt injection. An agent reading customer-provided content (an email, a document, a chat message) can be manipulated by carefully crafted content into taking actions the operator did not intend. The mitigation is in the authority guardrail (the agent does not have authority to do dangerous things in the first place) and in the observation guardrail (unusual actions trigger review before completion).

Tool confusion. An agent in a complex environment with many tools can call the wrong tool against the wrong data. The mitigation is in the policy engine (strict scope enforcement) and in the tool gateway (per-tool monitoring for anomalous call patterns).

Cascading agent calls. Agents calling other agents can create dependency chains that are hard to audit. The mitigation is in the audit log (the full chain has to be reconstructable) and in the policy engine (chain depth is bounded).

Where this lands in delivery

A reference architecture is only useful if it can be delivered. The architectures I see working in practice share three characteristics.

The architecture is platformised. The agent registry, policy engine, tool gateway and audit log are shared services across the firm's agents, not bespoke to each use case. The cost-to-build of the first agent is high; the cost-to-build of the tenth agent is modest.

The threat model is documented. The threats above and the firm-specific additions are explicit. Each guardrail control is mapped to the threats it addresses. The mapping is reviewed periodically.

The accountability is named. Each agent has an accountable senior manager. The SMCR framework already requires this for regulated UK firms; the architecture should reinforce it rather than work around it.

Agentic commerce: the integration architecture nobody is talking about

2026-08-13T00:00:00.000Z

The consultancy discourse on agentic commerce has, in the last twelve months, settled into a familiar shape. Accenture's "dawn of the agentic deal" and "agentic commerce" pieces. PwC's "real change agents". BCG's $200 billion agentic AI opportunity for technology service providers. McKinsey's banking and marketing workflow pieces. Every framing is at the value layer: what the agents will do, what the business model looks like, how the firm captures the upside.

The integration architecture underneath this is where the actual work sits. That layer is conspicuously absent from the public commentary.

This piece is for the architects who have to build it.

What agentic commerce actually requires

The shape is straightforward once you specify it. An agentic commerce flow involves an agent (either acting for the customer or acting for the firm) that:

Discovers what is available
Negotiates terms
Confirms intent
Triggers fulfilment
Settles payment

Each step has to interact with at least one back-office system. In a regulated firm, each step also has to leave an audit trail, has to respect customer consent, has to support reversibility, and has to be observable in production.

Most enterprise estates are not built to support this flow.

The five integration layers

Five layers turn up in every working agentic commerce implementation.

1. The catalogue layer. Whatever the agent discovers, it needs structured access to. For most firms, this means an MCP-compliant catalogue server that exposes products, prices, availability, terms and constraints in a format agents can read directly. The existing e-commerce APIs are usually not sufficient; they are designed for browser-driven UI, not agent-driven exploration.

2. The negotiation layer. Where the agent has authority to negotiate (volume discounts, structured terms, custom payment arrangements), the negotiation has to be bounded. The boundaries are commercial decisions the firm has to make explicitly and that the architecture has to enforce. An agent that can offer arbitrary discounts will, eventually, offer arbitrary discounts.

3. The consent and confirmation layer. Before the transaction commits, the customer (or the customer's agent, if delegated) has to have confirmed. The confirmation has to be cryptographically auditable, has to be linked to the specific transaction, and has to be retrievable on demand for the regulatory retention period.

4. The fulfilment layer. The transaction triggers back-office processes. For physical goods, this means the inventory and logistics systems. For services, this means the service-provisioning systems. For financial products, this means the booking and settlement systems. Each of these has to accept agent-originated requests and treat them with the same rigour as human-originated requests.

5. The settlement layer. Payment has to settle. For regulated firms, this includes KYC and AML checks even where the customer is an existing customer. The settlement has to be reversible during the regulatory window for dispute, and the reversal has to flow back through all four upstream layers cleanly.

Where the architecture choices land

Three choices determine whether the implementation works or breaks under load.

Catalogue: MCP or custom? The MCP standard is maturing. A firm building a custom catalogue interface for agents is building a bespoke layer that will need revisiting in 12-18 months. A firm building against MCP is betting on a standard that may or may not stick. The defensible position, in my view, is to expose both: a custom interface for the firm's own agent stack and an MCP-compliant interface for third-party agents.

Negotiation: in-band or out-of-band? A negotiation that happens inside the catalogue interaction is operationally cleaner but commercially more constrained. A negotiation that happens out-of-band (the agent contacts a sales workflow that may include human intervention) is commercially more flexible but operationally harder to audit. The choice depends on the firm's product mix and the regulatory envelope.

Settlement: same-rails or new-rails? Settling agent transactions on the same rails as human transactions keeps the back office consistent but inherits the back-office's existing constraints. Building new rails (agent-specific settlement) keeps the agent flow clean but creates a second class of transaction that has to be reconciled. Most firms should default to same-rails with explicit agent-flag instrumentation.

The audit trail problem

In a regulated firm, every step of an agentic commerce flow has to be auditable. The trail has to capture:

What the agent saw at each step (catalogue contents, terms, availability)
What the agent recommended or attempted
What the customer or principal confirmed
What the back-office systems received and processed
What the settlement layer cleared

Most existing systems do not log at the granularity this requires. The architecture function has to specify the logging contract before deployment, not retrofit it afterwards.

What this looks like in practice

I have been building a related system (CANVAS, the internal application and vendor approval workflow at Sonnedix) over the last 18 months. It is not exactly agentic commerce but it shares the architectural shape: agent-mediated decisions with full audit trails and reversibility. See the CANVAS case study for the underlying patterns.

Where this leaves the firm

Agentic commerce is operationally heavier than the consultancy framing implies. The value capture is real; the integration work to get there is non-trivial; the architecture choices have multi-year consequences.

For firms that are starting this work in 2026, my recommendation is to invest in the catalogue layer first (MCP-compliant where possible), to build the audit trail contract before the first agent goes into production, and to treat the negotiation layer as a deliberate commercial decision rather than a default vendor configuration.

S/4HANA in the agentic era: where the enterprise architecture function sits

2026-08-06T00:00:00.000Z

The major consultancies have, over the last twelve months, all converged on the same theme. McKinsey formalised its SAP alliance and ran the Value Finder work; BCG announced its Conduct partnership in May 2026 specifically targeted at AI-driven ERP transformation; Accenture continues to run the largest SAP practice in the world; EY published the "S/4HANA transformation success: the human factor" piece a few months back.

The collective message: ERP transformation is being reshaped by the agentic AI shift, and the firms that get this right will do meaningfully better than the firms that treat the AI layer and the ERP layer as separate programmes.

The architecture function's read of the same shift is different. This piece sets out where the EA function sits when an S/4HANA programme has to land in an agentic environment.

What the consultancy framing gets right

Two things, in my reading.

The integration layer is the leverage point. The single biggest mistake in a legacy ERP estate is to treat the ERP as a black box that exposes a small set of interfaces and otherwise stays untouched. The agentic shift makes this position untenable: agents need to read from and write to the ERP in ways the original integration design never anticipated. The firms that will deliver the value are the ones that re-architect the integration layer deliberately.

The data layer is the second leverage point. S/4HANA moves the firm to a single in-memory data platform. The firms that deliver the value treat this as the foundation for the analytical layer, not just the operational one. The agentic shift compounds the value: agents reading from a clean canonical data layer operate noticeably better than agents reading from a fragmented one.

What the consultancy framing misses

Three things, in my experience having delivered the architecture of a CHF 350M+ S/4HANA programme.

The change control layer is structurally underweighted. Most S/4HANA programmes treat change control as a project-phase concern. In the agentic era, change control becomes a steady-state concern: every model update, every agent capability change, every integration change has to flow through governance that accounts for the AI-specific risks. The firms that will get this right are building this discipline into the operating model from day one, not retrofitting it.

The exit path is rarely modelled. A firm that deploys agents against S/4HANA has, in practice, chosen a coupling between the SAP estate, the chosen model provider and the chosen agent framework. The cost of switching any of those after the fact is non-trivial. The architecture function should be modelling the exit paths during design, not after.

The lights-on cost of the agentic layer is missing. The consultancy commentary covers the value capture from agents; it is largely silent on the operating cost. In a regulated firm, the agentic layer adds: ongoing model inventory maintenance, ongoing prompt governance, ongoing exit path testing, ongoing model performance monitoring, ongoing audit trail review. These are non-trivial steady-state costs the EA function has to budget for.

Where the EA function sits in the programme

Four explicit responsibilities.

1. The integration architecture. The EA function owns the design choice between embedded SAP agents, agents that call into S/4HANA via the SAP-published APIs, and agents that read from a derived data layer. The trade-offs are real and they affect the operating model for years.

2. The data model. The S/4HANA data model becomes the operational data model. The EA function owns the question of what gets canonical status, what gets derived, what gets duplicated and what gets archived. The agentic layer compounds the importance of this choice.

3. The agent capability boundary. Which business processes does the firm allow agents to operate against? Which require human-in-the-loop? Which are agent-blocked entirely? The EA function should be authoring this, not inheriting it from the AI vendor's default configuration.

4. The vendor lock-in posture. The S/4HANA programme locks the firm to SAP. The agentic layer can either compound that lock-in or partially offset it depending on where the integration sits and which standards the agentic layer uses (MCP being the most relevant). The EA function should be running this trade-off explicitly.

What the right operating model looks like

The S/4HANA programmes that land well in the agentic era share three characteristics.

A small architecture cell embedded in the programme, with explicit authority over the integration, data and agent boundary decisions. Not a steering committee; a working group with delivery authority.

A documented architecture position that the programme delivers against. Updated periodically, but not re-litigated continuously. Most programmes underweight this.

A regulatory and risk function that engages with the architecture position rather than reviewing it at end of phase. The agentic layer makes after-the-fact review materially worse than concurrent engagement.

Where this leaves the firm

ERP transformation in the agentic era is a meaningfully different programme from ERP transformation five years ago. The EA function has more weight to carry; the integration and data choices have larger downstream implications; the steady-state cost of governance is higher.

For the firms doing this work now, my recommendation is to invest disproportionately in the architecture cell during the design phase, document the position carefully, and budget for the steady-state cost of agentic governance from day one.

What UK financial services regulation means for AI architecture in 2026

2026-07-30T00:00:00.000Z

The UK financial services regulators have moved noticeably faster on AI in the last 18 months than the consensus expected. The FCA's Discussion Paper 5/22 posture has firmed up through the joint Bank of England / PRA AI Discussion Paper, the SS1/23 model risk management supervisory statement, and the FCA's 2025-26 AI strategy. Most of the EY, KPMG, Deloitte and PwC commentary on this is calibrated to the advisor's audience: the board, the executive committee, the chief risk officer.

The architecture function lives one layer further in. The architects building the systems that will or will not be compliant with this regulatory envelope need a different read of the same material. This piece is for that audience.

What the regulators have actually said

Three things have crystallised.

Model risk management applies to AI. SS1/23 confirms that machine learning models, including generative AI where used in regulated activities, fall within the scope of model risk management. The implications are specific: model inventory, model validation, model performance monitoring and governance escalation all apply, and the firm's three lines of defence have to adjust.

Outsourcing rules apply to AI vendors. The FCA and PRA are explicit that an AI service provider (whether that is a foundation model API, a vertical AI tool, or an embedded AI feature in a SaaS product) is an operational outsource arrangement. The firm has to do the same vendor due diligence, the same exit planning and the same operational resilience analysis it does for any other material outsource.

Senior management responsibility is named. SMCR already places accountability with named senior managers for material risks. The regulator has been clear that AI risk is a material risk; the accountability is allocated, and the architecture function's design choices are auditable against that accountability.

These three together set the operating envelope for any AI deployment in a regulated UK firm.

What the architecture function has to do

Six implications.

1. Maintain a live model inventory. Every AI system in the firm has to be in a register. The register has to include the model, the use case, the data sources, the human-in-the-loop arrangements, the validation status and the named senior manager accountable. This isn't a one-off document; it is a continuously maintained artefact, and the architecture function is typically the owner.

2. Design for auditable decision trails. Where an AI system contributes to a regulated decision (customer onboarding, credit, suitability, complaints handling, trading), the trail of inputs, model outputs and human override has to be auditable for the regulatory retention period. This sits on top of conventional logging and requires deliberate design.

3. Treat AI vendor selection as an outsource decision. Foundation model APIs are operational outsources. The architecture function should be running them through the firm's outsource framework rather than the technology procurement framework. The two have materially different gates.

4. Build exit paths. The outsource framework requires demonstrable exit paths. For a foundation model provider that means an alternative provider has to be viable, the firm's prompts and data sets have to be portable, and the operational continuity in a provider-failure scenario has to be tested. Most firms have not done this work.

5. Plan for the EU AI Act overlap. UK firms with EU customers operate in two regulatory envelopes. The EU AI Act's high-risk system requirements apply where the firm's AI system is used to deliver services to EU customers. The architecture function has to design for the more demanding of the two regimes, not the easier one. See Data residency for AI workloads.

6. Get ahead of MCP and agent governance. Agent interoperability standards (MCP in particular) are maturing faster than the regulatory commentary. A firm that deploys agents without explicit governance over which tools they can call, against which data, with what authority, is exposed. The architecture function should be the source of this governance, not the legal function. See Auditing agent decisions and MCP is the most important enterprise standard nobody is implementing.

What this looks like in delivery

In practice, the firms doing this well share four characteristics.

A senior architect with explicit accountability for the AI risk envelope. Not a chief AI officer; a chief architect who treats AI risk as part of the broader architecture remit.

A model inventory that is updated as part of the change process, not as a standalone exercise. The change control workflow refuses to release a system that changes the AI use case without an updated register entry.

An outsource gate that AI vendors actually pass through. The technology team can recommend; the outsource committee approves; the architecture function provides the technical assessment.

A regulatory radar wired into the architecture function. When the FCA publishes a new portfolio letter or the PRA issues a new supervisory statement, the architecture function reads it, assesses it against the firm's estate, and tables an impact paper to the relevant committee.

Where this leaves the firm

The UK regulatory posture on AI is, on the whole, proportionate. It is not designed to prevent firms from deploying AI; it is designed to make them deploy it carefully. The architecture function is the part of the firm best placed to deliver that carefulness in practice.

For more on the broader operating model implications, see also A reference architecture for agentic AI in the regulated enterprise, How AI is reshaping the compliance function: an architect's view, and the existing pieces on auditing agent decisions and cursor in a regulated industry.

Event-driven architecture: when it adds value, and when it doesn't

2026-07-23T00:00:00.000Z

Executive summary

Event-driven architecture has become, over the past decade, one of the more confidently recommended patterns in modern technical practice. The general advice — that systems should communicate through events rather than direct synchronous calls — is now widely adopted and substantially codified in the technical literature.

The reality across enterprise contexts is more nuanced. Event-driven architecture, applied with discipline in the right context, delivers material benefits in decoupling, resilience, and scalability. Applied without that discipline, or applied to contexts where it does not fit, it introduces a category of complexity that organisations underestimate at the outset and pay for over years.

This piece sets out a framework for assessing when an event-driven approach genuinely adds value, when a simpler synchronous design would have been the right choice, and what the architectural indicators are for each. It is not an argument against event-driven architecture; it is an argument for selecting it deliberately rather than adopting it as the default.

Where the pattern came from, and why it has been over-applied

Event-driven architecture, in its modern form, emerged from several parallel developments. The growth of large-scale internet platforms demonstrated the limits of tightly coupled synchronous architectures. The emergence of mature message broker technology — Apache Kafka in particular — made event streaming practical at enterprise scale. The broader move to microservices created a category of inter-service communication problems that event-driven patterns were well suited to address. And the influential body of writing from companies that had successfully scaled their architectures using event-driven patterns established the credibility of the approach.

The result, by the mid-2020s, was a strong default in favour of event-driven architecture in new system design. This default has produced both genuine benefits and, in my observation, a substantial amount of over-application.

The over-application is not surprising. The benefits of event-driven architecture are visible in headline scenarios — the canonical case studies from large-scale platforms — and the costs are diffused across the operational lifecycle of the system. The right comparison is rarely available at the design stage: a synchronous version of the same system, running in the same context, with which the event-driven version can be benchmarked. Without that comparison, the recommendation to adopt the pattern looks costless. It is not.

The five contexts where event-driven architecture adds value

The pattern adds genuine value in specific contexts. Five of them that recur across the enterprises I have observed.

1. Asynchronous, long-running business processes

When a business process has steps that are inherently long-running — typically because they depend on external systems, on human action, or on temporal triggers — modelling the process as an event-driven workflow is materially cleaner than the synchronous alternative.

The architectural marker is a process where one or more steps may take seconds to days, where the calling system has no reasonable basis for blocking on the result, and where the eventual completion needs to trigger downstream actions. A vendor onboarding workflow, an insurance claim assessment, a customer credit check, a multi-stage fulfillment process — each of these is a natural fit for an event-driven design.

The wrong alternative in this context is typically a polling approach, where the calling system repeatedly checks the status of the long-running process. Polling is workable at small scale and becomes operationally fragile at larger scale. The event-driven alternative is cleaner.

2. Multi-consumer data distribution

When the same data needs to be distributed to multiple downstream systems, each of which consumes it for different purposes and on different cadences, an event-driven approach offers significant architectural advantages over the alternative.

The architectural marker is a set of systems all dependent on a common data source — typically a system of record, such as a customer master, a product catalogue, an order book — where each consuming system has its own data model, its own consumption frequency, and its own latency tolerance.

The pattern that emerges in this context is the publication of domain events from the system of record onto a durable event log, with each downstream consumer subscribing to the events relevant to its purpose. The system of record does not need to know about the consumers. The consumers do not need to coordinate with each other. New consumers can be added without changes to the producer. This is a class of decoupling that synchronous architectures genuinely cannot match.

3. Audit, observability and replay requirements

In contexts where the system needs to maintain a complete, replayable history of significant events — for audit, for analytical purposes, for the ability to reconstruct system state at a prior point in time — an event-driven architecture using an immutable event log is a natural fit.

This is particularly relevant in regulated industries, where the audit story matters substantially. An event-sourced subsystem provides a complete, append-only record of every change to its state. The state itself is derivable from the event log at any point. The audit requirement is satisfied as a property of the architecture rather than as a separate logging concern.

The marker for this context is a regulatory or operational requirement to demonstrate the complete history of a particular domain — financial transactions, regulatory submissions, clinical decisions, vendor risk assessments. In each case, the cost of the event-sourcing pattern is justified by the audit story it provides.

4. Cross-organisational integration with limited coordination

When systems owned by different organisations need to exchange data or trigger actions in each other's domains — partner APIs, multi-enterprise supply chains, regulator-to-firm submissions, inter-bank transactions — event-driven patterns reduce the coordination cost meaningfully.

The synchronous alternative requires that each integration is designed around the specifics of the partner system, with each endpoint negotiated, each schema versioned bilaterally, and each change managed through bilateral discussion. The event-driven alternative, particularly using a published industry-standard schema or a mediating event hub, allows each party to evolve their internal systems with greater independence.

This is a context where the gain is largely organisational rather than technical. The technical complexity of event-driven integration is non-trivial. The reduction in coordination overhead is what justifies it.

5. Genuine scale and throughput requirements

When the volume of inter-system communication is such that the synchronous alternative would impose unsustainable operational demands — typically measured in events per second rather than in business transactions per day — event-driven architecture is the appropriate response.

The architectural marker is a system whose throughput requirements exceed what a synchronous design could comfortably sustain on the available infrastructure. Telemetry pipelines, financial market data, large-scale logistics tracking, IoT sensor data — each of these is a context where the volume itself justifies the event-driven design.

In these contexts the choice is not really event-driven versus synchronous; it is which event-driven design to adopt. The synchronous alternative is not viable at the required throughput.

The three contexts where event-driven architecture is the wrong choice

Counterpart to the above. Three contexts where, in my observation, the synchronous alternative would have been the better choice and the event-driven design has caused problems that are still being absorbed.

1. Simple request-response interactions

The most common over-application is the use of event-driven patterns for what is, in essence, a simple request-response interaction. The calling system needs the response of the downstream system to proceed. The downstream system can respond synchronously in a small number of milliseconds. The semantics of the interaction are straightforward call-and-return.

In this context, modelling the interaction as an event-driven exchange introduces several costs without commensurate benefit. The latency increases, because the request has to traverse the event bus rather than a direct call. The error handling becomes more complex, because the caller now has to handle the possibility that the response never arrives. The operational dependency on the event broker becomes a single point of failure that the simpler synchronous design would not have introduced.

The architectural marker for this anti-pattern is a system where the calling code, in effect, has to wait for the response anyway — either through correlation IDs and asynchronous waits, or through explicit polling — and the event-driven nature has become a kind of complication wrapped around a synchronous interaction.

The recommendation in this context is to use synchronous calls and to accept the coupling. The coupling is real but typically modest in this kind of interaction, and the operational cost of the event-driven alternative substantially exceeds the cost of the direct dependency.

2. Transactional consistency requirements

When the business semantics of an interaction require that multiple state changes either all happen or none of them happen — the classic atomic transaction — event-driven architecture introduces a category of complexity that organisations consistently underestimate.

The synchronous alternative, particularly within a single database, provides atomic transactions as a property of the underlying system. Two updates within the same transaction either both commit or both roll back. The application code does not need to model the failure scenarios in detail; the database handles them.

The event-driven alternative replaces this with a saga pattern, in which the equivalent atomicity is achieved through a sequence of events with compensating actions. Saga patterns are well-documented and well-understood as a concept. They are also operationally demanding to implement correctly, and the failure scenarios they need to model are numerous.

The architectural marker for this anti-pattern is a system where the engineering team is spending substantial effort on the design, testing, and operational handling of compensating actions to maintain a property that a synchronous database transaction would have provided as a baseline.

The recommendation in this context is to keep the transactional interaction within a single bounded context, served by a single database with synchronous transactions, and to use event-driven patterns only for the genuinely cross-context interactions where the coupling cost would otherwise be high.

3. Small-team contexts with limited operational maturity

The third anti-pattern is less about the workload characteristics and more about the organisational context. Event-driven architecture requires a meaningful investment in the operational platform — the event broker, the monitoring infrastructure, the schema registry, the dead-letter queue handling, the replay tooling. The investment is appropriate at sufficient scale and with sufficient organisational maturity. It is disproportionate at smaller scale or in less mature contexts.

The architectural marker for this anti-pattern is a small engineering team, often early in its operational maturity, that has adopted event-driven patterns by default and is spending a meaningful proportion of its time on platform issues rather than on the business problem the systems are meant to solve.

The recommendation in this context is to start with a synchronous architecture, to maintain a clear set of internal interfaces along domain boundaries, and to migrate to event-driven patterns only when the specific need arises and the operational platform exists to support it. Event-driven architecture is a destination some systems should reach. It is rarely the right starting point.

A practical assessment framework

For architecture leaders evaluating whether to adopt an event-driven approach for a specific system or system boundary, a small set of questions can structure the decision.

Question	If "yes"	If "no"
Is the interaction inherently asynchronous, long-running, or temporally decoupled?	Event-driven design likely justified.	Synchronous likely simpler.
Are there multiple downstream consumers of the same data?	Event-driven enables decoupling benefit.	Synchronous typically adequate.
Is there a regulatory or audit requirement for a complete event history?	Event-sourcing pattern likely valuable.	Standard logging typically sufficient.
Does the throughput exceed what synchronous infrastructure can comfortably support?	Event-driven is required, not optional.	Synchronous remains viable.
Do you have the operational maturity to run an event-driven platform reliably?	Proceed if other answers warrant.	Defer event-driven adoption until the platform investment is justified.
Does the interaction require transactional consistency across the systems involved?	Be cautious of saga complexity.	Event-driven is likely a clean fit.

A system that answers "yes" to multiple of the first four questions and has the operational maturity for the fifth is likely a good fit for an event-driven design. A system that answers "yes" to the sixth and "no" to the others is likely better served by a synchronous approach.

The framework is not a decision tree in any rigorous sense. It is a structured way to surface the considerations that, in practice, are too often skipped at the design stage.

Implications for architecture leaders

Three broader implications.

The architecture function should resist the default toward event-driven patterns. The general technical literature, the vendor narratives, and the broader practitioner conversation all tend toward recommending event-driven approaches. The architecture function's role is to apply judgement to that recommendation in the specific context of the organisation and the specific workload. This is not a fashionable position, but it is, in my observation, the position that produces better outcomes.

The operational maturity question is more important than the technical-fit question. The technical fit for event-driven architecture is generally easier to assess than the operational maturity required to run it. The technical fit determines whether the pattern can produce value. The operational maturity determines whether it will. Organisations that adopt event-driven patterns before establishing the operational platform consistently underestimate the cost.

The pattern should be revisited at major architectural milestones. Systems evolve. A system that did not warrant event-driven design at inception may warrant it as it grows. A system that was designed event-driven may be carrying complexity that is no longer justified. The architecture function should treat the choice as reversible at major milestones — at significant scale changes, at major reorganisations, at the end of programme phases — and should revisit it rather than treating it as a once-and-done decision.

Closing

Event-driven architecture is a powerful tool when applied to the right problem. It is also one of the patterns most prone to over-application in current practice, and the operational cost of that over-application accumulates across the lifecycle of the systems it affects.

For architecture leaders, the recommendation is to retain the pattern in the toolkit, to apply it where the context warrants, and to maintain the discipline to choose a simpler synchronous design where it does not. The choice between event-driven and synchronous is not a question of which is better in the abstract. It is a question of which is appropriate to the specific system, the specific workload, and the specific organisation. The framework above is one way of structuring that question.

Lessons from large-scale ERP transformation: an architect's perspective

2026-07-16T00:00:00.000Z

Executive summary

Large-scale ERP transformation — the multi-year, multi-region, nine-figure programmes that migrate a global organisation to a modern ERP platform — remains one of the most complex undertakings an enterprise architecture function can lead. The published commentary on these programmes tends to fall into two categories: vendor narratives, which emphasise the destination at the expense of the journey, and consultancy retrospectives, which emphasise the methodology at the expense of the architectural detail.

What is rarely discussed in either category is what the architecture function actually grapples with during the programme — the specific decisions, the moments where the architectural posture matters most, and the lessons that translate across organisations and across ERP product lines. This piece is an attempt to fill that gap, drawn from leading the global architecture for an S/4HANA transformation of nine-figure scale across multiple regions.

Five lessons. None of them are surprises individually. The combination is what produces the difference between a programme that delivers the architectural foundation the organisation will use for fifteen years and a programme that delivers a system that runs but does not provide that foundation.

Context

For the avoidance of doubt: the lessons that follow are drawn from a programme that ran over multiple years, across more than fifteen country deployments, with a total programme value in the hundreds of millions, on SAP's S/4HANA platform. The specifics of the organisation are not relevant to the lessons; the lessons themselves are. Where the lesson is specific to SAP or to S/4HANA, this is called out. Where it is more general, it generalises to other large-scale ERP platforms (Oracle Cloud ERP, Workday for finance, Microsoft Dynamics 365 for finance and operations) and to the broader class of multi-year platform programmes.

Lesson 1: The architectural foundations laid in the first six months determine the next ten years

The single most consequential set of decisions in a large-scale ERP programme is the foundational architectural choices made during the design and blueprinting phase, typically in the first six months of the programme.

These decisions are well-rehearsed at the level of headline choices — template versus country-by-country design, single instance versus multiple instances, public versus private cloud, on-premise versus hosted, the depth of the deployment model — and the programme governance generally treats them as the major decision points they are. What is less well-recognised is the long tail of architectural decisions that follow from each headline choice, each of which constrains the next decade of evolution.

The decision to deploy a global template, for instance, carries with it implicit choices about the depth of the master data hierarchy, the granularity of organisational units, the design of the chart of accounts, the structure of material masters and customer masters, and dozens of similar foundational data structures. Each of these choices is technically reversible after go-live but is, in practice, prohibitively expensive to change. The combined set establishes the shape of the architectural ground on which the organisation will build for the next decade.

The lesson is that the architectural rigour applied to these foundational decisions needs to be substantially greater than the rigour typically applied. Not in the sense of more meetings or heavier documentation, but in the sense of explicit consideration of the long-term consequences of each decision, articulated in a form that is understandable to non-architects and that survives the inevitable personnel changes through the programme's lifetime.

For architecture leaders entering a programme at this scale, the recommendation is to establish, before the design phase begins, a small set of architectural principles that the foundational decisions will be tested against. These principles should be written down, agreed at executive level, and revisited explicitly at each foundational decision point. Principles that are obvious in the abstract become decisive when applied to specific design choices under time pressure.

Lesson 2: The integration architecture is more important than the ERP itself

A consistent observation across large-scale ERP programmes is the disproportionate amount of value, and the disproportionate amount of risk, that sits in the integration architecture rather than in the ERP itself.

The ERP is a packaged product. Its capabilities are largely defined by the vendor's design. The organisation's design freedom is in how it configures the product, not in what the product does. The integration architecture, by contrast, is bespoke. It connects the ERP to dozens, sometimes hundreds, of surrounding systems — the CRM, the e-commerce platform, the manufacturing execution systems, the customer-facing portals, the analytics platform, the regulatory reporting systems, the data warehouses. The shape of these integrations is the organisation's choice and the organisation's responsibility.

In programmes that go well, the integration architecture is treated as a first-class deliverable, with named ownership, formal governance, explicit standards, and rigorous testing. In programmes that go poorly, the integration architecture is treated as a series of necessary plumbing exercises, delegated to the systems integrator, and discovered to be the source of operational issues only after go-live.

The lesson, in practical terms, is to invest disproportionately in the integration architecture during the design phase, to maintain a single integration architect of sufficient seniority across the full programme lifecycle, and to ensure that the integration design decisions are surfaced to the same governance forum as the ERP design decisions. The integration architecture is not an implementation detail.

For organisations on S/4HANA specifically, this lesson applies with particular force because the move to S/4HANA from a prior ECC environment typically requires meaningful redesign of the existing integration pattern. The HANA-native database structures, the adoption of CDS views as the data access pattern, the BTP-based integration approach, and the broader move away from older middleware patterns all combine to make the integration architecture genuinely new rather than incrementally updated.

Lesson 3: The data is harder than the process

In every large-scale ERP programme I have observed or been part of, the data migration and the master data design have absorbed significantly more effort, surfaced significantly more issues, and created significantly more delay than the process design.

This is counter-intuitive at the outset. The headline framing of an ERP programme is about business processes — order-to-cash, procure-to-pay, record-to-report, hire-to-retire. The implication is that the work is process design and configuration. In practice, once the process design is settled, configuration is a comparatively mechanical exercise. The difficult work is the data.

The difficulty arises from several sources. Historical data quality is almost always worse than the legacy systems' apparent state suggests. The reconciliation of master data across multiple legacy systems, often each with its own slightly different version of the same customer, product or supplier, is intricate and politically fraught. The design of the new master data hierarchies is a place where the organisational politics of how the business is structured become visible, and where the architectural decision is constrained by the operating model decision. And the migration itself, when it finally happens, requires sustained attention to quality at a level that few programmes plan for adequately.

The lesson is that data should be treated as a first-class workstream from the beginning of the programme, with its own architect, its own governance, and its own quality measurement. The data workstream is not a sub-task of the process workstream. The relative weighting of effort, in a well-run programme, is closer to a 60-40 split between data and process than the 20-80 split that the early programme planning typically assumes.

For SAP S/4HANA programmes specifically, this lesson is compounded by the discipline that S/4HANA's data model imposes — the move from the more permissive ECC data model to the stricter S/4HANA structures means that data quality issues which were tolerable in the legacy estate become blocking issues in the new platform. Programmes that underestimate this typically discover it during the first cutover, which is the worst possible moment to discover it.

Lesson 4: The governance model has to be designed for years, not for the programme

Large-scale ERP programmes are typically governed through a programme structure — a steering committee, a programme board, workstream leads, design authorities — that is appropriate for the programme's duration but is rarely fit for the post-programme operating model. When the programme concludes, the governance structure dissolves, and the architecture function is left with an inadequately designed ongoing model.

This is the source of a familiar pattern: the platform is delivered to a high standard, the organisation goes live successfully, and within eighteen months the platform begins to accumulate decisions that do not fit the original architectural intent because the governance forum that would have prevented them no longer exists in its programme form.

The lesson is that the programme governance model must be designed with the post-programme operating model in mind. The Design Authority that governs decisions during the programme should evolve into the Architecture Governance Board that governs decisions after the programme. The standards and patterns established during the programme should be documented in a form that survives the programme's conclusion and is maintained by a named function.

In a programme I led, this transition was planned eighteen months before go-live, with the post-programme governance structure explicitly designed and the transition of named roles into the post-programme model documented as part of the programme closure. The pattern proved durable. The platform's architectural integrity was preserved through the first three years of operation under the governance model that the programme had established. This is not the typical outcome.

The recommendation for architecture leaders is to make the post-programme operating model an explicit programme deliverable, to design it with the same rigour as any other architectural design, and to ensure that the transition is treated as a critical milestone in the programme plan, not as a clean-up activity after go-live.

Lesson 5: The change management story is an architectural concern

The framing of change management as a separate workstream from architecture is, in my view, increasingly unhelpful. The two are deeply intertwined and the architectural decisions made during the programme have direct implications for the change management challenge.

A platform that adopts the vendor's standard process where the organisation's current process is materially different will require a much larger change management effort than a platform that has been configured to accommodate the existing process. The architectural choice between standardising on the vendor's process and customising to the existing process is, in part, a change management choice disguised as an architectural one.

The lesson is that the architecture function should be involved in the change management strategy, not as a peripheral input but as a central voice. The architectural choices about template adherence, configuration depth, the degree of process harmonisation, and the phasing of country deployments each have direct implications for the magnitude of the change management challenge and the likelihood of successful adoption.

A practical pattern that works well is the establishment of a joint architecture-and-change forum, meeting on a defined cadence during the programme, where the architectural decisions are tested against their change management implications and vice versa. This forum serves as a check on the natural tendency of the architecture function to favour cleaner technical designs at the expense of adoption, and on the natural tendency of the change management function to favour minimal disruption at the expense of long-term technical health.

Implications for transformation leaders

Three broader implications for executives sponsoring or leading transformation programmes of this scale.

The architecture function's role is not a supporting function in these programmes. It is the function that determines whether the investment produces the foundation the organisation will operate on for the next decade. The seniority, the authority, and the durability of the architecture function across the programme lifecycle are not implementation details; they are programme success factors.

The selection of the systems integrator should be informed by their architecture leadership, not by their workforce capacity. The systems integrator's commercial proposition is typically framed around capacity — the number of consultants, the day rates, the project management methodology. The actual differentiator across systems integrators in large-scale ERP work is the calibre of their lead architects and the depth of their architectural practice. This should be weighted heavily in the selection process.

The programme should expect to revisit foundational decisions at defined points. Some of the foundational decisions made in the first six months will look incorrect with the benefit of two years of programme experience. The governance model should include defined review points at which the foundational decisions are re-examined and, where appropriate, formally revised before the cost of the original decision compounds further. This is counter-cultural in programmes that are pressured to maintain forward momentum, but the cost of revisiting a foundational decision at month eighteen is, in almost every case, smaller than the cost of carrying it through to go-live and beyond.

Closing

Large-scale ERP transformation remains, in 2026, one of the most demanding undertakings an enterprise will take on. The vendor products have matured; the implementation methodologies have become more disciplined; the cloud-based delivery patterns have removed some of the historical friction. The fundamental challenges — the foundational decisions, the integration architecture, the data, the governance model, the change management — remain.

The lessons above are not the only ones I would draw from leading work at this scale. They are the ones that I have not seen written about with the directness that, in my view, they merit. For architecture leaders entering or running programmes of this kind, I hope they are useful. The work is hard and the published guidance is, in places, less honest about the difficulty than the work itself deserves.

Identity-first security: rethinking the enterprise perimeter in 2026

2026-07-09T00:00:00.000Z

Executive summary

The shift from network-perimeter security to identity-centric security has been underway for the better part of a decade. The trend is not new. What is new in 2026 is the degree to which the shift is now structurally complete in most regulated enterprises, the architectural implications that flow from this, and the specific patterns that distinguish organisations that have made the transition well from those still in the middle of it.

This piece sets out the current state of identity-first security architecture, the five implications that architecture leaders should be working through, and a framework for assessing the maturity of an organisation's identity posture.

It is not a treatment of identity technology selection. The question of which identity provider to deploy is well covered elsewhere, and the answer in 2026 is for most organisations a choice among a small number of well-established providers (Microsoft Entra ID, Okta, Ping Identity, the AWS-native option for cloud-first shops, and a small number of open-source alternatives for specific contexts). The interesting questions are about how the architecture function exercises the identity-first posture, not about which product enables it.

The structural shift

The network perimeter — the firewall around the data centre, the VPN as the controlled access path, the implicit trust granted to traffic that originated inside the corporate network — was the dominant security control for forty years. It is no longer.

Several developments have combined to produce this outcome. The adoption of public cloud means a meaningful proportion of the estate is, by definition, outside any network perimeter the organisation controls. The widespread move to SaaS for business applications removes those systems from the perimeter entirely. The structural shift to hybrid working has eliminated the assumption that "user inside the network" is a meaningful concept. The growing use of third-party integrations, partner APIs, and managed services has multiplied the number of legitimate connections that cross what used to be the perimeter. And the consistent record of perimeter breaches — where attackers reach the network interior and then move laterally with the implicit trust the network model granted — has made the perimeter-trust model demonstrably unsafe.

The replacement, broadly described as zero trust or as identity-first security, treats every access decision as an explicit authorisation event. The trust granted to any given request is calculated based on the identity of the requesting principal, the device that principal is using, the context of the request, and the sensitivity of the resource being accessed. The network from which the request originates is, at most, one factor among several, and is treated as untrusted by default.

This is the model that the major enterprise security frameworks have now substantively adopted — the NIST zero trust architecture guidance, the UK National Cyber Security Centre's design principles, the various sector-specific overlays from the FCA, the PRA, and equivalent bodies in other jurisdictions. Identity, in this model, is the primary control. The other controls are supporting.

Five implications for the architecture function

The implications for the architecture function are substantial. Five that I would recommend any architecture leader work through explicitly.

1. The integration of the enterprise identity provider becomes a non-negotiable

A consequence of the identity-first posture is that every production system needs to integrate with the enterprise identity provider. Local user accounts, shared service accounts, and the various forms of out-of-band authentication that have accumulated in most enterprises over the years become, in this model, exceptions that require explicit justification.

The architectural implication is that the question "does this system integrate with our identity provider" moves from a desirable property to a hard prerequisite. New application onboarding, vendor selection, and acquisition integration each need to apply this filter. Applications that do not support modern federation protocols — OpenID Connect, SAML 2.0, SCIM for provisioning — are increasingly difficult to justify in a regulated environment.

For the existing estate, the architecture function should expect to spend a non-trivial proportion of its modernisation effort on identity integration retrofits. This is unglamorous work that rarely produces a visible new business capability. It is, however, the work that materially reduces the organisation's security exposure, and as such belongs near the top of the modernisation backlog.

2. Service-to-service authentication needs the same discipline

The identity-first posture is sometimes applied conscientiously to human users while being overlooked for service-to-service traffic. This is a meaningful gap. The lateral movement that produces serious breaches typically does not involve human user accounts; it involves compromised service credentials, over-privileged service accounts, and the various forms of implicit trust between internal systems.

The architectural response is to treat service identity with the same rigour as user identity. Specifically: every service has a named identity, ideally backed by an OAuth 2.1 client credentials flow or equivalent. Service credentials are short-lived (ideally minutes, not days). Permissions granted to a service are scoped to the specific resources it requires, not to its containing system. Service authentication is logged at the same level of detail as user authentication. And shared service accounts — the credential that several systems use to authenticate to a database, for instance — are eliminated.

This is non-trivial work, particularly in legacy estates. It is also the work that closes the largest single category of security gap I encounter in real production environments.

3. Permission models need to be explicit and reviewable

The identity-first model requires that the permissions granted to each principal be explicit, scoped, and auditable. In practice, this means moving away from broad role-based models toward more fine-grained access patterns: attribute-based access control where appropriate, just-in-time privilege elevation for high-risk operations, time-bounded permissions for project-specific access, and so on.

The architectural decision is not which permission model to adopt in the abstract but how to structure the organisation's permission data such that the access decisions can be made, audited, and revoked at the granularity the regulatory environment requires. This is, increasingly, a data-architecture question as well as a security-architecture question, and one where the two functions need to be working in close partnership.

A practical pattern emerging in the better-organised functions I have observed is the codification of permission policies as declarative artefacts in source control, applied uniformly across the identity provider, the cloud platforms, and the application layer. This pattern — sometimes described as "policy as code" — brings the version control and review discipline that the rest of the engineering organisation already applies to its software, to the permission data that ultimately controls who can do what.

4. The audit story is materially different

In a network-perimeter model, audit was largely a function of network logs — what traffic crossed which boundary, when. In an identity-first model, audit is a function of authorisation events — who attempted to access what, was the request authorised, what context informed the decision, and what action followed.

The implication is that the organisation's logging and event infrastructure needs to capture authorisation events at the level of detail required for regulatory audit. This includes, at minimum, the identity of the principal making the request, the resource being accessed, the permissions evaluated, the decision reached, and the context that informed the decision (including the device context, the network context, and any risk signals).

For organisations that have been operating under regulatory regimes with strong audit requirements — financial services, healthcare, defence — this is largely a continuation of an existing discipline, albeit one that needs to be extended to cover the broader set of identity events that the identity-first posture surfaces. For organisations newer to this level of audit detail, the implication is a meaningful investment in logging infrastructure, retention policy, and the analytics layer that turns authorisation events into actionable intelligence.

A related point that often goes underdiscussed: the audit trail generated by an identity-first posture is itself valuable as an input to the architecture function's own measurement practice. Patterns in authorisation events — which resources are accessed most frequently, which permissions are exercised most often, which access requests are denied — provide useful signal about the actual versus intended use of the estate.

5. The vendor selection criteria change

The criteria the architecture function applies in evaluating new vendors need to incorporate the identity-first posture explicitly. Specifically:

Support for the organisation's federation protocols, in the specific versions and configurations the organisation requires.
Support for SCIM-based provisioning, with the data attributes the organisation maintains in its identity provider.
Granular permission models exposed through the vendor's API, rather than coarse role-based ones.
The vendor's own internal identity discipline — how the vendor authenticates the customer's data when its staff access it, and what audit trail is provided for that access.

These criteria are, increasingly, table stakes for vendors selling into regulated enterprises. The architecture function's role is to ensure that the evaluation process applies them rigorously and that vendor exceptions are not granted on commercial grounds alone.

A working maturity framework

For architecture leaders looking to assess where their organisation sits on the journey, a four-stage maturity framework is useful.

Stage	Characteristics
Perimeter-centric (legacy)	Most production systems still depend on the network perimeter as the primary control. Local user accounts are common. Service authentication is via shared accounts or unscoped credentials. Lateral movement after a perimeter breach would be straightforward.
Federated (transitional)	Enterprise identity provider integrated with the majority of production systems. Single sign-on widely available. Some service-to-service authentication still relies on legacy credentials. Some legacy systems remain on local accounts as exceptions.
Identity-first (target)	Every production system integrated with the enterprise identity provider. Service-to-service authentication uses short-lived, scoped credentials. Permission policies are version-controlled. Audit trail captures authorisation events at the required level of detail.
Identity-aware (mature)	All the above, plus dynamic risk-based access decisions, just-in-time privilege elevation, and continuous monitoring of authorisation patterns. The identity layer itself is a primary source of security intelligence.

In my observation, most regulated enterprises sit in the "transitional" stage, with the more mature security organisations either at "target" or actively working through the gap to it. The move from "target" to "mature" is meaningful additional investment and is appropriate for organisations with elevated threat profiles.

For most organisations, the right ambition is reaching the "identity-first" stage and operating sustainably at it. The further maturity stage is a refinement, not a transformation.

Implications for the architecture function

Three broader implications.

The architecture function's role in security has expanded. The identity-first posture is fundamentally an architectural posture, not a tooling decision. The architecture function carries substantial responsibility for ensuring the posture is achievable across the estate, that the necessary integration patterns are documented and adopted, and that the exceptions to the posture are explicitly approved rather than silently tolerated. This is a meaningful expansion of the function's remit.

The boundary between architecture and security is becoming less useful as an organisational construct. In several organisations I have observed working through this shift, the historical separation between the architecture function and the security function has become a source of friction. The security function holds the policy authority. The architecture function holds the implementation authority across the estate. The work of bringing the estate into alignment with the identity-first posture requires close, sustained partnership between the two functions.

A pragmatic operating model that has emerged in better-organised enterprises is the establishment of a joint security architecture practice — a small standing forum, with named representatives from both functions, that takes joint ownership of the policy-to-pattern translation and of the implementation governance across the estate. This is not a structural reorganisation; it is a working model that respects the two functions' distinct authorities while ensuring they operate in step.

The investment case is different from the historical security investment case. Identity-first security does not have the visible, dramatic justification that perimeter security had — the "we built a wall, attackers were stopped at the wall" narrative. Its value is in the absence of harm, in the reduced blast radius of compromises that do occur, and in the cleaner regulatory posture that follows. This is a harder case to make to a finance function looking for clear quantifiable benefits.

The recommendation for architecture leaders working through this case is to anchor it in specific compliance and audit outcomes where they exist, in the demonstrable reduction in privileged access exposure as a measurable input to the regulatory reporting cycle, and in the operational simplification that comes from consolidating to a single identity provider. The case can be made; it requires more careful articulation than the historical security investment case did.

Closing

The identity-first posture is not a project. It is a sustained shift in how the enterprise treats access control, with implications across application design, vendor selection, audit, and the organisational relationship between architecture and security. The organisations that have made the transition cleanly are quietly better placed for the regulatory environment of the next decade than those that have not.

For architecture leaders who have not yet made an explicit position on the maturity stage they are operating at and the one they intend to reach, that conversation is overdue. The work is substantial; the path is well-trodden; the alternative is to remain in a security model that the rest of the industry has moved past.

Architectural fitness functions: a practical framework for measuring enterprise architecture health

2026-07-02T00:00:00.000Z

Executive summary

The persistent challenge facing enterprise architecture functions is not the absence of strategy or the absence of documentation, but the difficulty of demonstrating measurable impact. A capability model is not a metric. A target operating model is not a measurement. A roadmap is a plan, not an outcome.

Architectural fitness functions — a concept introduced by Neal Ford, Rebecca Parsons and Patrick Kua in their work on evolutionary architecture — provide a structured response to this challenge. A fitness function is a measurable, ideally automated indicator of whether a given architectural property is being preserved or improved over time. The concept is not new; the practice, in most organisations I have observed, remains immature.

This piece sets out a practical framework, organised across six categories, that an architecture leader can adopt to bring measurement discipline to their function. Each category includes specific example metrics, the data sources required to compute them, and a brief commentary on common pitfalls.

The case for fitness functions

The traditional measurements applied to architecture functions — project delivery times, system uptime, vendor consolidation savings — have several shortcomings. They are outcome measures of the broader IT function rather than of the architectural choices specifically. They are often lagging indicators by some margin. And they tend to reward stability over improvement, which is the inverse of what an architecture function ought to be incentivised to deliver.

Fitness functions, by contrast, are designed to be leading indicators of architectural health. They measure properties that the architecture function has direct influence over, on a cadence short enough to drive corrective action, and in a form that can be discussed productively with non-architects.

The implementation requirement is modest. Most fitness functions can be computed from data the organisation already produces: source control activity, deployment logs, system metadata, cost dashboards, and security scans. The marginal cost of producing the measurements is small. The marginal benefit, particularly when the measurements are tracked over time and shared with the broader leadership team, is substantial.

What follows is a framework of six categories, each with three or four illustrative fitness functions. The framework is intended as a starting point; organisations should adapt and extend it to their specific context.

Category 1: Architectural alignment

Measurements of the degree to which the actual estate corresponds to the stated architecture.

Fitness function	Description	Data source
Standards conformance rate	The proportion of applications or services in the portfolio that conform to the published architectural standards (cloud-only deployment, container-based runtime, mandatory observability instrumentation, and similar).	Application portfolio metadata; CMDB; infrastructure tagging.
Capability coverage	The proportion of the published capability model that has at least one named owning application or service.	Capability model; portfolio mapping.
Reference architecture adoption	The proportion of new applications shipped in a given quarter that follow the published reference architecture, weighted by complexity.	Architecture review records; deployment metadata.
Exception backlog	The number of approved architectural exceptions currently active, and the median age of an open exception.	Exception register; AGB records.

The objective in this category is not to drive every metric to 100%. Some level of deviation from the standard is healthy — it reflects the architecture function's response to genuine business need rather than rigid enforcement. The objective is to surface the trend. A standards conformance rate that has been declining for three quarters is a signal that warrants investigation, regardless of the absolute level.

Category 2: Operational characteristics

Measurements of the system properties that the architecture is intended to produce.

Fitness function	Description	Data source
Deployment frequency	The median number of production deployments per service per week, across the portfolio.	CI/CD pipeline logs.
Lead time for change	The median elapsed time from code commit to production deployment, across the portfolio.	Source control; deployment pipeline.
Mean time to recover	The median time from incident detection to incident resolution, weighted by severity.	Incident management system.
Change failure rate	The proportion of production deployments that result in a degraded customer experience or a rollback.	Deployment pipeline; incident records.

These four are the DORA metrics, well-established and widely adopted. They are operational measures rather than purely architectural ones, but the architecture function has substantial influence over each: deployment frequency is constrained by the architecture's ability to be deployed independently, lead time is constrained by coupling and integration complexity, and so on.

The architecture function should track these metrics not as a substitute for the engineering organisation tracking them, but as a leading indicator of where architectural intervention may be warranted. A persistent low deployment frequency in a particular domain is often a symptom of an architectural problem the team has stopped trying to fix.

Category 3: Technical debt and modernisation

Measurements of the estate's evolution toward, or away from, a modern technical baseline.

Fitness function	Description	Data source
End-of-life exposure	The number of production services running on technology versions that are within twelve months of vendor end-of-life.	Vulnerability scanning; CMDB.
Security patch latency	The median time between a critical security advisory being published and the patched version being deployed to production, across the portfolio.	Vulnerability management system.
Dependency currency	The median age of the dependencies used across the portfolio, relative to the latest stable releases.	Software bill of materials; dependency scanning.
Modernisation rate	The proportion of the legacy application portfolio that has been retired, replaced, or substantially modernised in the trailing twelve months.	Application portfolio; project records.

This category is often the most politically charged. End-of-life exposure in particular is a measure that frequently surfaces uncomfortable realities. The discipline is to publish the measurement, agree the threshold above which intervention is required, and track progress against it. The measurement itself does not produce the modernisation; it produces the conversation that funds the modernisation.

Category 4: Cost and resource efficiency

Measurements of the architecture's economic characteristics.

Fitness function	Description	Data source
Cost per business transaction	The total infrastructure and operating cost attributed to a defined business transaction (a customer onboarding, an order processed, a report generated), measured monthly.	Cloud billing; transaction logging.
Cloud utilisation	The proportion of provisioned cloud capacity that is meaningfully utilised in a given month.	Cloud monitoring.
Vendor concentration	The number of distinct vendors providing comparable capabilities across the portfolio, and the cost weighting across them.	Contract register; cost allocation.
AI workload economics	For organisations with significant generative AI workloads, the cost per inference, the prompt-cache hit rate, and the percentage of cost attributable to retries.	LLM gateway logs; cost allocation.

The last of these four is increasingly relevant. Generative AI workloads have a cost profile that is unusually sensitive to small architectural decisions — the discipline around prompt caching, the choice of model for a given task, the design of retrieval — and these decisions are the architecture function's territory. A fitness function focused on AI workload economics provides the visibility that lets the architecture function intervene before costs become material.

Category 5: Security and compliance posture

Measurements of the estate's security and regulatory standing.

Fitness function	Description	Data source
Identity coverage	The proportion of production systems integrated with the organisation's enterprise identity provider, as opposed to maintaining local user accounts.	Identity provider; CMDB.
Secrets sprawl	The number of secrets in the secrets management system, the number found outside it (in environment files, configuration repositories, etc.), and the ratio between the two.	Secrets scanner; vault audit logs.
Audit log completeness	The proportion of production systems producing audit logs that meet the organisation's published retention and detail requirements.	Logging infrastructure.
Privileged access exposure	The number of standing privileged access grants across the production estate, and the proportion of privileged access activity that is just-in-time provisioned.	Identity provider; PAM solution.

For organisations subject to material regulatory oversight, these measurements are independently useful as inputs to the regulatory reporting and audit cycle. The architecture function's responsibility here is to set the target, not necessarily to operate the measurement infrastructure, which generally sits with the security function.

Category 6: Knowledge and decision quality

Measurements of the architecture function's documentation and decision-making practice itself.

Fitness function	Description	Data source
Decision throughput	The number of Architecture Decision Records authored per quarter, normalised by the size of the architecture function.	Documentation repository.
Decision lead time	The median elapsed time from a decision being proposed to being formally accepted.	ADR metadata.
Knowledge accessibility	The proportion of architecture documentation that has been queried via the internal knowledge system in the trailing month.	Documentation analytics; LLM assistant logs.
Onboarding effectiveness	A periodic survey-based measure of how quickly new architects feel productive after joining the function, with a target benchmark.	Internal survey.

These measurements address the architecture function's own operating model, which is rarely measured but is materially important. An architecture function with a slow decision lead time becomes the bottleneck it was supposed to alleviate. An architecture function whose knowledge base is not being consulted is not earning its keep as a custodian of organisational memory.

Implementation considerations

Five practical points for organisations adopting this framework.

Begin with a small set. Six categories with three to four metrics each is fifteen to twenty-four measurements. That is too many to operationalise at once. I would recommend selecting one metric from each category as the initial set, establishing the data pipeline and the publication cadence, and then extending once the practice is established.

Publish the measurements visibly. The benefit of fitness functions accrues from the conversation they generate, not from the measurement itself. The measurements should be visible to the leadership team, ideally as a standing item on the relevant governance forum. A dashboard that exists but is not reviewed has no effect.

Establish thresholds, not just measurements. Each fitness function should have a target threshold — the level above which the architecture function considers the property to be in a healthy state — and a trigger threshold, below which intervention is required. Without thresholds, the measurements become decorative.

Treat the measurements as inputs to decisions, not as performance indicators of individuals. The temptation to use fitness functions as performance management indicators for engineers or architects should be resisted. The measurements are diagnostic; they identify where the architecture needs attention, not who is to blame for it needing attention. Using them as individual performance metrics will produce the predictable behavioural distortions and will degrade the quality of the measurement over time.

Review the framework annually. The set of fitness functions that matters to an organisation evolves as the organisation evolves. A measurement that was critical eighteen months ago may have served its purpose. A new measurement may now be needed. The architecture leadership should review the framework on a defined cadence, retiring measurements that have ceased to provide value and adding new ones as needed.

Implications for architecture leaders

The broader implication of adopting a fitness functions framework is that the architecture function moves from a function defined by its deliverables — the artefacts it produces — to a function defined by its measurable outcomes. This is, in my view, a necessary evolution for the discipline.

The architecture function that can demonstrate, with data, that the estate's standards conformance is improving, that technical debt is being addressed at a defined rate, that cloud utilisation is rising and unit costs are falling, that the security posture is strengthening, and that the function's own decision throughput is healthy, has a fundamentally different conversation with the executive team than the function that produces an annual capability model refresh and a target operating model that nobody reads.

For architecture leaders considering this shift, the recommendation is to begin small, to publish openly, and to allow the measurements to drive the conversation rather than to dictate the conclusions. The framework above is one starting point. The work of adapting it to a specific organisational context is itself a useful exercise in articulating what the architecture function is for.

The evolving role of architecture decision records in the age of generative AI

2026-06-25T00:00:00.000Z

Executive summary

Architecture Decision Records — short, structured documents capturing why a particular design choice was made, what was considered, and what was rejected — have been an established practice for the better part of a decade. They have become, quietly, one of the most useful artefacts in any mature architecture function. They are inexpensive to write, easy to read, and hold their value across personnel changes in a way that few other architecture deliverables do.

What has changed in the last eighteen months is the cost structure of producing them. The combination of generative AI tooling, modern source-control workflows, and the maturing ecosystem of architecture linters means that the marginal cost of authoring an ADR has fallen substantially. This has changed the economics of the practice and, with it, the right operating model.

This piece sets out how I have seen ADR practice evolve, the five considerations I would recommend any architecture leader work through when reviewing their own approach, and the implications for the broader architecture function.

The state of ADR practice in 2026

The ADR pattern, in its simplest form, has not changed since Michael Nygard's original write-up in 2011. A record captures a decision title, the context in which it was made, the decision itself, the alternatives considered, and the consequences. It is stored alongside the code or configuration it relates to, typically in a directory called docs/adr or equivalent, and is treated as an immutable artefact: superseded decisions are replaced by new records that reference the original, not by edits in place.

What has changed is the surrounding context. Three observations that frame the rest of this piece.

First, the volume of architecture decisions has increased materially. The combination of microservices proliferation, cloud service sprawl, and the now-routine integration of generative AI components into enterprise systems has multiplied the number of choice points at which an ADR is the appropriate response. A mid-sized enterprise that ten years ago might have produced a dozen ADRs per year will, in a comparable function today, produce many times that number.

Second, the cost of producing each individual ADR has fallen. Modern coding assistants can draft a reasonable first version from a short briefing, capable of producing the structural elements (context, options, decision, consequences) in seconds. The architect's time is no longer consumed by the structural drafting; it is consumed by the substantive review and the judgement calls that the assistant cannot make.

Third, the readership has broadened. ADRs were once read primarily by other architects. The combination of expanded technical literacy across product and engineering teams, the rise of internal LLM-based knowledge tooling that surfaces ADRs in response to natural-language questions, and the broader push for transparency in technical decision-making means that the audience for an ADR today extends well beyond the original architecture community.

The combined effect is a practice that is more valuable than it has ever been and is being executed at greater scale and lower unit cost. The risk is not that ADRs become irrelevant. The risk is that they become voluminous, inconsistently authored, and poorly governed — the same trap that documentation practices have fallen into before.

Five considerations for an evolving ADR practice

In conversations with architecture leaders across industries over the past year, five considerations recur. Each merits an explicit position in the architecture function's operating model.

1. The role of the LLM in the drafting process

There is no longer a question of whether generative AI tooling will be involved in ADR authorship. It will. The question is at what point in the drafting workflow, with what guardrails, and with what attribution.

A workable pattern I have seen in practice is the following. The architect, or the engineer making the decision, briefs the LLM on the context — typically a short paragraph or a set of bullet points describing what is being decided and why. The LLM produces a first draft. The draft is reviewed, edited substantively, and signed off by a named human author. The record is committed to source control with a co-authorship attribution that makes the AI involvement explicit. The reviewer in a subsequent ADR review process is aware that the document originated as an AI draft.

The pitfalls to avoid in this pattern are familiar. The first is under-editing — accepting the draft as written when it would not have been accepted from a human contributor. The second is the opposite: over-editing, which loses the time efficiency the assistance was meant to provide. The third is the loss of decision provenance, where the rationale captured in the ADR is the assistant's plausible synthesis rather than the actual reasoning of the decision-maker. All three are addressable through review discipline, but they require explicit attention.

2. The standard template

Many organisations have evolved their ADR template over time, often adding fields specific to their context. In an AI-assisted authorship model, the consistency of the template matters more than it did, because the assistant performs significantly better when working to a well-defined structure.

I would recommend any architecture leader formally codify the organisation's ADR template, publish it as a Markdown file in the documentation repository, and ensure that the prompt used to brief the AI assistant references that template explicitly. A template that includes:

Title and unique identifier
Status (Proposed, Accepted, Superseded, Deprecated)
Date and author(s)
Context (the situation requiring the decision)
Decision (the position taken)
Options considered (with a brief assessment of each)
Consequences (positive and negative)
Related records (links to relevant prior ADRs)

provides sufficient structure for an LLM to produce useful drafts, and is parseable by tooling for downstream uses such as automated dashboards or compliance reporting.

3. The review and approval workflow

The review process is where the practice most often breaks down at scale. Two common patterns are worth being explicit about.

The first is decision-level review, where every ADR is reviewed by an Architecture Governance Board or equivalent. This works well at low volumes but becomes a bottleneck at scale. It also tends to shift the centre of gravity in the practice from "documenting decisions made by the team" to "decisions made by the AGB and documented retrospectively", which is a subtle but meaningful inversion.

The second is categorical review, where ADRs are classified at authorship into tiers — typically something like "team-level" (merged with code review), "domain-level" (reviewed by the relevant domain architect), and "enterprise-level" (reviewed by the AGB). This pattern scales more comfortably and preserves the principle that the team closest to the decision is the one capturing it. It requires a clear classification rubric, which should itself be published.

For organisations adopting AI-assisted authorship at scale, I would expect categorical review to become the more common pattern. The volume of ADRs that the function will need to handle is unlikely to be sustainable under decision-level review.

4. ADRs as a queryable corpus

Once an organisation has a meaningful body of ADRs in source control — say, a hundred or more across the architecture function — the corpus itself becomes a valuable asset. A common use case emerging in practice is the integration of the ADR corpus with an internal LLM-based assistant, allowing architects and engineers to ask natural-language questions such as "what is our position on microservices boundaries for transaction-processing workloads" or "have we previously rejected the use of a particular technology, and on what grounds".

This use case is straightforward to implement with current retrieval-augmented generation tooling. The architectural requirements are modest: a consistent metadata schema across ADRs, a well-defined storage layout, an indexing pipeline (which can be as simple as a nightly job), and an integration point with the organisation's preferred internal assistant platform.

The benefit, in my observation, is twofold. First, decisions are re-applied consistently across teams, reducing the rate at which the same question is debated repeatedly in different contexts. Second, new architects joining the function have a much faster on-ramp to the organisation's established positions, which would otherwise be tribal knowledge.

5. The retention and supersession policy

ADRs are immutable, but they are not eternal. An ADR that established the organisation's position on a now-obsolete technology is a historical record, not a current standard. The architecture function needs a clear policy on supersession — when a new ADR supersedes an older one, the older record is updated with a status change to "Superseded by ADR-NNN" but is not deleted.

Less commonly discussed but equally important is the retention policy for ADRs that have been superseded for a long time. The straightforward answer is that ADRs are kept indefinitely. Source control storage is inexpensive, and the historical value of being able to trace the evolution of the organisation's architecture position over time is meaningful, particularly in regulated contexts where post-hoc audit may require reconstructing the reasoning behind a decision made some years prior.

The architecture function should publish its supersession and retention policy as part of the ADR practice documentation.

Implications for the architecture function

Three broader implications for architecture leaders.

The architecture function's documentation discipline is becoming a competitive advantage. In a context where decision velocity is high and the cost of capturing decisions has fallen, the organisations that have established robust documentation practices will accumulate a strategic asset over time — a queryable record of why their architecture is what it is. The organisations that have not will increasingly struggle with consistency, with on-boarding, and with the regulatory expectations that are emerging in several sectors.

The skills profile of architects is shifting. The skill of producing a clean ADR draft has been substantially commoditised by AI tooling. The skill of recognising when an ADR is required, of asking the right questions to surface the actual decision context, and of facilitating a substantive review discussion has become correspondingly more important. Architecture leaders should consider this shift in their hiring and development plans.

The relationship between the architecture function and engineering teams is changing. ADRs that are written by engineering teams with AI assistance, reviewed by domain architects, and escalated to the AGB only for genuinely cross-cutting decisions, represent a meaningful redistribution of architectural authorship across the organisation. This is, in my view, a positive evolution. The architecture function's role becomes more about facilitation and curation than about authorship, which is a better use of senior architectural expertise.

A note on related practice

Architecture Decision Records sit alongside several other documentation practices that have evolved in parallel over the past decade — capability models, value stream maps, technology radars, and the broader category of internal technical writing. The considerations above apply, with appropriate adjustments, to each of these. The opportunity for architecture leaders is to take a coherent view across the whole documentation practice rather than treating ADRs as a standalone discipline.

The pieces that follow in this series will examine that broader practice — including the role of fitness functions in measuring architectural health, and the changing shape of identity and security architecture — through the same lens of considered, practical evolution.

What an acquisition-heavy company actually needs from its architects

2026-06-18T00:00:00.000Z

TL;DR

There is a particular kind of company that the standard enterprise architecture playbook does not fit. Private-equity-backed roll-ups and other acquisition-heavy growth businesses acquire several small companies per year and integrate them at varying degrees of depth. The textbook EA function — the operating model, the Architecture Governance Board, the capability model, the standards forum — is calibrated for a stable enterprise where the underlying portfolio changes slowly. In a roll-up, the portfolio changes by a third every two years. The textbook function spends its first eighteen months catching up to the perimeter and then never gets ahead of it.

This piece is what I think the architecture function actually owns in an acquisition-heavy business, what it deliberately does not own, and a working framework for the part it does own. It is opinionated because the pattern is under-discussed; I have not seen good public writing on it and I have made most of the mistakes I describe.

The argument in one sentence: in an acquisition-heavy company, the architecture function's job is not to enforce a target end-state. It is to make the next acquisition cheaper than the last one, the one after cheaper still, and the seventh one cheap enough that nobody asks architecture for permission.

The pattern

An acquisition-heavy business is one where the company acquires three to ten other businesses per year, each one small relative to the parent, and integrates them at some level. The acquisitions might be in the same business vertical (the classic roll-up: many small dental practices, many small accounting firms, many small HVAC companies, many small renewable-energy operators) or in adjacent verticals (a platform business absorbing complementary capabilities). The economic logic is different in each case but the architecture problem is similar.

What is similar:

The parent company starts with a single set of systems, usually decent quality, often standardised.
Each acquisition arrives with its own stack. The stack is almost always a mix of: one or two SaaS products the team genuinely needs, several SaaS products they tolerate, a couple of internal applications that nobody understands, and a thick layer of spreadsheets that turn out to be load-bearing.
The acquisition closes legally before the technology has been fully understood, let alone integrated.
Operating leverage from the acquisition depends on integration happening, but the timeline pressure is intense and the integration team is small.
The next acquisition arrives before the current one is fully integrated.

That last bullet is the one that makes this distinct from single-large-acquisition integration work. A merger of equals or a single-bet acquisition is a one-time event with a defined end. A roll-up is a continuous flow. The architecture function is running a pipeline, not delivering a project.

Why textbook EA doesn't fit

A standard enterprise architecture function builds toward a steady state. It defines the target operating model, populates the capability model, sets standards, and runs a governance board that keeps drift in check over time. The model assumes a portfolio that is mostly stable and gradually evolved.

This doesn't survive contact with continuous acquisition. Three things break:

The target end-state assumption breaks. Defining a target end state requires that the inputs to the model are stable enough to plan against. They are not. Six months into a target-state definition, two new acquisitions have arrived with stacks that weren't in the model. The target state has to be re-cut. After three or four re-cuts, the team gives up on target-state work and the operating model becomes "react to what's in front of us".

The standards body breaks. Standards are useful when they apply to a long-lived portfolio. They are less useful when half the portfolio has been part of the company for less than two years and came in with a stack the standards never anticipated. The Architecture Governance Board ends up either approving everything (rubber stamp) or refusing everything (bottleneck). Neither produces useful governance.

The capability model breaks. The capability model is supposed to be a stable spine. Acquired businesses have capabilities that overlap with the parent's but at different boundaries, different granularities, different ownership patterns. Trying to force the acquired business's capability map onto the parent's is the kind of work that produces three months of consultancy fees and zero business value. The opposite — keeping every acquisition's model separate — produces a sprawl with no integration story.

The textbook function isn't wrong; it is calibrated for a different problem. The problem an acquisition-heavy company actually has is not "design the target state". It is "make the integration of the next acquisition cheap, fast, and reversible".

What architecture actually owns in this context

The architecture function in an acquisition-heavy business owns three specific things and explicitly does not own several others.

Owns: the integration pattern. The repeatable playbook for how an acquired business connects to the parent's data, systems, and processes. Not a one-off project plan; a pattern that can be parameterised for each acquisition. The pattern includes the sequence (what happens in days 30, 60, 90, 180), the technical decisions (which acquired systems are kept, which are sunset, which are migrated), the governance touchpoints (which integration decisions require AGB sign-off and which are routine), and the rollback posture (what to do if the integration uncovers problems that weren't in due diligence).

Owns: the federated data model. A small, deliberately limited data model at the parent level that lets the parent's systems talk about acquired businesses without requiring every acquired business to conform to a single master schema. The model captures the minimum data the parent needs (legal entity, business unit, revenue attribution, customer-of-record, employee count, financial consolidation point) and lets each acquired business retain its own operational schemas underneath. The architecture function is the keeper of the federation, not the harmoniser.

Owns: the vendor consolidation calendar. The acquired businesses arrive with overlapping vendor relationships. The parent has its own. The right answer is rarely "consolidate everything to the parent's vendors on day one"; the right answer is a calendar that sequences vendor consolidation against renewal dates, integration priorities, and the architecture function's own bandwidth. The calendar is a living artefact, updated each quarter.

What architecture explicitly does not own in this context:

Does not own: the target operating model. That belongs to the operating partners and the COO. Architecture is a contributor, not the owner. Target operating models in roll-ups are political artefacts as much as technical ones; the architecture function does not have the authority or the visibility to drive them.

Does not own: the integration project plan for any specific acquisition. That belongs to the integration project lead, who is usually a programme manager attached to the M&A function. Architecture provides the pattern; it does not execute every instance.

Does not own: the capability model in any strong sense. The capability model in an acquisition-heavy company is a useful sketch, not a source of truth. Investing heavily in it produces diminishing returns. Keep it lightweight.

Does not own: standards enforcement on acquired companies during their first year. This is the most counter-cultural part of the framework. Acquired companies should be left alone for their first year on standards questions, with very specific exceptions (security, data residency, regulatory compliance, identity integration). The exceptions matter; the rest does not. The architecture function that tries to enforce its standards on a just-acquired business in month three loses both the trust of the acquired team and the political capital it needs for the integration that does matter.

The 30-60-90-180 pattern

A repeatable acquisition-integration playbook, at the architecture layer. The numbers are days from legal close.

Days 1–30: Discover

What happens at the architecture layer: a 30-day discovery exercise on the acquired stack. The output is a written assessment that goes into the integration team's hands.

Specific things the discovery covers:

Identity and access. What identity provider does the acquired business use? How many user accounts? Where are admin credentials held? What is the offboarding process for departing staff?
Data inventory. What datasets does the acquired business produce, and which are now joint-ownership with the parent under the acquisition agreement? Where is this data physically stored? Is any of it subject to regulatory or contractual constraints?
Application portfolio. What applications are in production? Who uses each? Which are SaaS, which are self-hosted, which are custom-built and unmaintained? Which have active vendor support?
Vendor relationships. What contracts exist? When are renewals? What are the costs? Are any of the contracts inherited from a previous owner that we don't have visibility into?
The shadow stack. The spreadsheets, the scripts, the developer-laptop tools that aren't on the official inventory but are load-bearing. These are usually 30% of the actual operation and 0% of the documented one.

The deliverable at day 30 is a written assessment, not a recommendation. The recommendation comes after the integration team has had its parallel commercial and operational reviews.

Days 30–60: Decide

What happens at the architecture layer: a sequenced decision on the acquired stack, with specific categorisation of each application.

The decision categories I have found useful:

Keep, integrate. The acquired application is good enough that it stays, but it needs to be integrated with the parent's identity, data, and operational tooling. Most operational applications fall here.
Keep, federate. The acquired application stays as a separate island, with a thin integration to the parent's federated data model but no deeper integration. Often the right answer for vertical-specific applications where the acquired business knows its own domain better than the parent does.
Migrate. The acquired application is going away; users migrate to the parent's equivalent over a defined timeline. The parent's application must be capable enough to absorb the acquired use case.
Sunset. The acquired application is going away with no direct replacement because it was solving a problem the parent solves differently or doesn't have. Usually a small minority of cases.
Conditional. The application is on probation. Decision deferred for six months while the integration team learns whether the underlying business process needs to change.

Each application in the acquired stack gets one of these labels. The label drives the integration plan. The labels are formally signed off at the AGB.

Days 60–90: Integrate the non-negotiables

What happens at the architecture layer: identity, data residency, security baseline, financial consolidation. The minimum integration required to run the acquired business as a subsidiary.

This is the part of integration that does not wait for the full plan. Identity has to be unified within ninety days for offboarding to work. Security baseline has to be hit so the acquired company isn't a hole in the parent's posture. Financial consolidation has to be wired so that month-end works. These are non-negotiable and the timeline is tight.

The application-level integrations from the day-30 decisions come later. Days 60–90 is about the floor, not the ceiling.

Days 90–180: Execute the application plan

What happens at the architecture layer: the categorised application decisions from day 60 are now project work. The architecture function transitions from designer to consultant: the integration team executes the plan; architecture is on call for the decisions that come up during execution.

By day 180, most of the application-level integration work is either complete or has a clear plan with named owners and timelines. The acquired business is operating as a normal subsidiary, with overlapping reporting lines into the parent, with the parent's identity and security in place, and with a clear roadmap for the remaining work.

Past day 180, the acquired business should be unremarkable from an architecture perspective. The function's attention should be on the next acquisition, not the previous one.

Vendor consolidation: the actual mechanic

The single highest-value piece of work the architecture function does in an acquisition-heavy business is vendor consolidation. Not capability modelling, not target-state architecture, not standards forums. Vendor consolidation. Because every acquired business arrives with vendor contracts the parent is now paying for, and the durable savings — the actual EBITDA contribution from the M&A work — come from rationalising those contracts.

The mechanic, in detail:

A central register of every contract across every entity. Updated within ninety days of each acquisition. The register records vendor, product, scope, value, renewal date, contractual notice period, and the named relationship owner. This sounds obvious. Most companies do not have it.
A renewal calendar that is the input to the consolidation pipeline. Eight weeks before any contract renewal, the architecture function reviews whether the contract should be renewed as-is, renegotiated, consolidated with the parent's equivalent contract, or terminated. The decision is informed by the application-level decisions from day 60.
A consolidation pipeline that processes one to three contracts per quarter, on average. Not all at once; in sequence. The pipeline is sized to the architecture function's bandwidth, not to the theoretical number of consolidations available. Trying to do twelve consolidations in a quarter produces zero consolidations and a lot of stalled work.
An EBITDA tracker that reports the saved cost back to the finance function quarterly. This is the political win that buys the architecture function the credibility to keep doing the work. Without the tracker, the savings are invisible and the function's value is unmeasurable.

In the company contexts I have seen this work, the architecture function delivers high-single-digit-percentage operating savings per year from this work alone, year after year, indefinitely. It is the most reliable value the function produces. It is also mostly invisible from the outside, which is fine.

The federated data model

The structurally hardest piece of architecture work in an acquisition-heavy business is the data model. Two failure modes sit on either side of the right answer.

Failure mode 1: every acquired company conforms. The parent defines a master schema and every acquired business is required to migrate onto it. This produces six- to eighteen-month migration projects per acquisition, none of which deliver business value during the migration, all of which produce friction with the acquired teams, and many of which fail outright when the acquired business's actual operational needs do not fit the master schema.

Failure mode 2: no integration data model at all. The parent runs its own systems; each acquired company runs its own; nobody attempts to harmonise. This makes consolidated reporting impossible, makes financial close take three weeks instead of three days, and makes any cross-business operational decision require manual data wrangling.

The right answer is a deliberately small federated model. A specific list of data entities that the parent needs to know about at the corporate level, with a clean schema, and explicit federation contracts with each acquired business. The acquired businesses keep their operational schemas; they map them to the federated model on a defined cadence (daily, weekly, monthly depending on the entity).

The entities I would include in a minimum federated model:

Legal entity. Which legal company is each business operating under. Hierarchy of legal entities. Tax registrations.
Customer of record. Unique customer identifier across the group, where the same customer exists in multiple businesses.
Employee. Joint employees of any group company, mapped to the identity provider.
Application. What applications exist across the group, with ownership.
Vendor. What vendor contracts exist across the group, joined to applications.
Financial transaction (aggregated). Revenue, cost, and EBITDA at the granularity the CFO needs for consolidation. Not individual line items.

That is roughly six entities. Most attempts at a federated model go to twenty or thirty entities and collapse under the weight. Six is enough for ninety percent of the consolidation use cases and small enough to actually maintain.

The cultural component

A short and important note. The acquired teams have feelings about what is happening to them. The integration team — and the architecture function specifically — is the visible face of the parent's decisions about the acquired business's stack. Those decisions feel personal because they often involve sunsetting a system the acquired team built or stopping using a vendor the acquired team chose.

The right posture is to listen first, make decisions second, and communicate decisions transparently. Specifically:

The day-30 discovery is a real conversation with the acquired team, not just a read of the documentation. The conversation is the input to the day-60 decision.
The day-60 decision is communicated in writing, with reasons, and is open to challenge in a specified window before it becomes final. "Keep, federate" and "Conditional" are both used deliberately because they signal "we are not yet making the hard decision, and your input matters". "Sunset" is used sparingly and always with named replacement.
The acquired team retains influence over their stack for at least the first ninety days, even where the parent's architecture team has views. The exceptions are the non-negotiables (security, identity, regulatory) where the parent's posture is final.

This is not soft-skills theatre. It is the durable mechanism by which the integration succeeds. The acquired team's institutional knowledge of why their stack looks the way it does is the most valuable input the architecture function has. The function that ignores that input makes worse decisions and loses the trust it needs for the deeper integration work that comes later.

What it means for the architecture function's structure

The function in an acquisition-heavy business looks different from the textbook function.

Smaller standards forum. The Architecture Governance Board meets monthly, not weekly, and focuses on the consequential decisions: the day-60 categorisations, the federation contracts, the security exceptions, the genuinely cross-business architectural choices. Not the everyday-application questions.
A dedicated acquisition architecture lead. One person whose full-time job is the integration pattern: shepherding each acquisition through the 30-60-90-180 cycle. This role is the thing most acquisition-heavy companies are missing.
A vendor management partnership. The architecture function and the procurement function are tightly partnered, not at arm's length. The contract register is co-owned. The consolidation pipeline is co-run.
Lightweight capability modelling. A simple capability map exists at the corporate level — fewer than fifty top-level capabilities — and it is updated quarterly rather than as a major artefact. It serves as orientation, not as a planning tool.
A clear separation between architecture and integration execution. Architecture is on the design and the pattern; integration is on the execution. Both functions exist; they are not the same team.

This shape is leaner than the textbook EA function and busier per head. It is also the shape that actually fits the business model.

What this looks like when it is working

A short picture of the steady-state, for what to aim at.

Each acquisition closes legally, and within 30 days the architecture function has produced a written assessment of the acquired stack.
Within 60 days the AGB has signed off on the application-level categorisations and the integration plan.
Within 90 days the non-negotiables (identity, security, data residency, financial consolidation) are in place.
Within 180 days the application-level integrations are on a named roadmap with owners.
Vendor consolidation produces predictable EBITDA contribution quarterly.
The federated data model is stable; the entities in it are unchanged from one year to the next.
The architecture function is small relative to the size of the group, but the integration pattern works repeatably.
Each acquisition is cheaper to integrate than the last, because the pattern is mature and the federation is established.

The signal that the function is working is not the absence of incidents or the elegance of the target-state diagram. It is the unit economics of integration. If the next acquisition costs less to integrate than the previous one, the function is doing its job. If not, the function is failing in some specific way that the framework above lets you diagnose.

Why this isn't written about

The framework is under-discussed because it does not fit the prestige patterns of enterprise architecture practice. The prestige patterns — TOGAF certifications, capability-model deep-dives, the long-form ADRs that get praised on LinkedIn — are calibrated for stable enterprises. Acquisition-heavy work is more operational, more about pattern repetition than about elegant single artefacts, and more about EBITDA than about elegance.

The people who do it well tend not to write about it. The architecture press writes about the prestige patterns. The PE-backed roll-up architects are heads-down doing the integration work and probably not writing about it because the work itself is what they are paid for.

I have been on the inside of this pattern. The framework above is mine; if it is useful in your context, I would like to hear how it lands. If you are reading this and you are the architecture lead inside a roll-up business, please consider writing about the pattern as well. The discipline benefits from more honest writing about what actually works.

Meridian: building the EA platform we couldn't buy — describes the EA platform that was deliberately designed to cope with a fast-changing portfolio rather than a stable one.
CANVAS: building the approval workflow no commercial product covers — the workflow that wraps the acquisition-onboarding gate at the application level.
The commercial EA tool market has 18 months — the same shift, viewed from the vendor side.

Sovereign AI is mostly theatre. The actual technical question is data residency

2026-06-11T00:00:00.000Z

TL;DR

"Sovereign AI" is the framing every European government and most European enterprise software vendors are using in 2026 to describe the goal of running AI infrastructure without dependence on US hyperscalers. The political case is straightforward. The technical delivery is much messier than the framing suggests. Most "sovereign AI" propositions in the market today are either repackaging a US provider behind a European brand, building a non-frontier-class model with EU funding, or solving a different problem (data residency, model provenance, supply chain) and calling the bundle "sovereignty".

For an architect making real decisions in 2026 about where to run inference, where to store the data that feeds the models, and what to commit to in vendor contracts, the useful move is to separate the political question from the technical questions. The political question is whose flag is on the press release. The technical questions are three: where does the inference physically run, where does the data that the model sees physically live, and under whose jurisdiction is the operating company subject to discovery, subpoena, or compelled-access orders. Each of these has a clean answer for a given deployment. None of them are answered by buying a "sovereign AI" product per se.

This piece is the framework I would use to make the call, with specific patterns for the regulated enterprise context.

What "sovereign AI" usually means in 2026

The category is doing too much work. When a European government, a hyperscaler, a regional cloud provider, or a vendor uses "sovereign AI" in 2026, they typically mean one of:

A European frontier model. Mistral is the canonical example. Aleph Alpha was an earlier one. The pitch is that the model itself is European-trained, European-controlled, and competitive with the US frontier. The reality is that the gap to the US frontier has been narrowing on some metrics and widening on others, with the result that "sovereign frontier model" is a credible product for some workloads and not for others.
A European cloud running US models. OVHcloud running Mistral. Scaleway hosting Llama. The various "EU Bedrock-equivalent" offerings from European hyperscalers. The pitch here is that the infrastructure is European even if the model came from elsewhere.
A US hyperscaler running in EU regions with EU contractual controls. Microsoft's EU Data Boundary, AWS's European Sovereign Cloud, Google's Sovereign Cloud arrangements. The pitch is that the US provider can demonstrate that data stays in the EU and is subject to EU contractual controls even though the operating company is US.
A managed-services wrapper around any of the above. The service provider takes responsibility for "sovereignty" as a service offering. The customer doesn't have to design the controls themselves; they outsource the question to a vendor.
A pure marketing claim with no underlying architecture. This category exists. The vendor uses "sovereign AI" as a positioning word with no specific technical commitment behind it. Worth being able to recognise.

The five things are not the same. They have different cost profiles, different risk profiles, different capability profiles, and different legal postures. Treating "sovereign AI" as a single category obscures all of that.

The political moment

It is worth being honest about why "sovereign AI" is having the moment it is having.

A combination of: the EU AI Act compliance regime now in active enforcement (the second tranche of obligations landed in 2025, the third in 2026), the broader European tech-sovereignty push that predates AI, the geopolitical realignment around US-EU technology relations under the current US administration, the specific concerns that the Cloud Act creates for any EU data hosted by US companies, and a cohort of European AI investors who genuinely believe the next decade of model competition has space for non-US-aligned alternatives.

All of this is real and all of it produces real announcements and real funding flows. Several of the announcements over the past eighteen months are substantive. Several are political theatre. The architect's job is to be able to tell them apart at the specification level.

The three technical questions

The political framing — "is this sovereign?" — is not actionable. The actionable questions are three. Every architecture decision about AI deployment in a regulated European context can be decomposed into them.

Question 1: Where does the inference physically run?

The literal compute. The GPUs. The data centre. The geographic location of the silicon doing the matrix multiplications when your prompt is being processed.

This is the most concrete and most easily auditable of the three questions. The answer can be a city, a region, or a country. For EU regulatory purposes it usually needs to be a member state. For some specific national rules (FCA outsourcing rules, some defence contracts, some healthcare deployments) it needs to be the specific country.

The answer varies by vendor and by deployment configuration:

OpenAI API directly: US, primarily. EU data residency is available on enterprise contracts but the underlying inference may still route through US-based infrastructure for some workloads. Read the terms.
Anthropic API directly: US, primarily. Similar enterprise options.
Anthropic via AWS Bedrock: Customer-controlled. Pick an EU region; the inference runs there.
Anthropic via Google Vertex AI: Customer-controlled. Pick europe-west4 (Netherlands) or similar; the inference runs there.
Azure OpenAI: Customer-controlled. Pick West Europe or Sweden Central; the inference runs there.
Mistral via Mistral La Plateforme: France.
Self-hosted Mistral / Llama / Qwen on European infrastructure: Whichever region you deployed to.

The pattern: the question of where inference runs is a procurement and configuration question, not a "sovereign vs not" question. You can have EU-resident inference from US-based vendors. You can also have non-EU-resident inference from European-branded products if you don't configure carefully.

For most regulated workloads, the answer should be an EU member state and should be contractually committed. The vendor's willingness to commit to this is itself a signal of whether they are operating at enterprise grade.

Question 2: Where does the data the model sees physically live?

A different question. The model is doing inference somewhere. The data the model is reasoning over — the documents in your RAG corpus, the customer records the agent is acting on, the documents attached to the prompt — is being sent to that inference endpoint. The data must travel to where the inference runs. It may also be cached, logged, or retained somewhere along the way.

The data-residency question is therefore: across the full inference path, where does the data live, even transiently?

Specifically:

The corpus. Where is the vector database? Where are the source documents? Where are the embeddings stored?
The prompt and response in flight. What region does the network path traverse? Where are the load balancers?
The prompt and response at rest. Does the vendor retain prompts and responses? For how long? In what region?
The model weights, if you are fine-tuning. Where are the training datasets stored, where is the training compute, where are the resulting weights stored?

For a typical RAG-based agent in 2026, the corpus is the part most companies handle correctly (it lives in the company's own systems, usually in a regional managed service) and the inference logs are the part most companies handle incorrectly (the vendor's retention defaults are often longer than the company's policy, and the region may be different).

The audit-grade answer to this question is: every byte of data that touches the model — going in or coming out — has a specified region of residence at every stage, and the vendor contract matches the specification. Where the contract is silent, the default is whatever the vendor's general infrastructure does, which is usually US-routed.

Question 3: Under whose jurisdiction is the operating company subject to compelled access?

The hardest of the three to reason about because it is about legal process rather than technical configuration.

A US company operating a cloud in the EU is, under the Cloud Act, potentially subject to US compelled-access orders for data the company holds, regardless of where the data is physically stored. This is the core argument against US-headquartered hyperscalers hosting truly sensitive EU data. The US company can be ordered by a US court to produce the data, and complying with that order may put them in conflict with EU law (specifically GDPR's restrictions on third-country data transfers).

The mitigations:

EU Data Boundary and equivalent. Microsoft, AWS, and Google have all rolled out variants of contractual and technical arrangements designed to address Cloud Act concerns. The details matter. Some of these arrangements are substantive (separate legal entities incorporated in the EU, with EU staff, EU encryption keys held by EU entities, technical separation of the EU infrastructure from US operations). Some are less so. Read the actual terms.
EU-incorporated operating companies. A truly EU-resident cloud — operated by an EU-incorporated company, with no US parent in the legal structure, with no US staff with admin access — is not subject to the Cloud Act. OVHcloud, Scaleway, Aruba, and the various national cloud initiatives fall in this category.
On-premise deployment. The company runs the inference inside its own data centre, on its own infrastructure, using a model it has the right to run (typically an open-weights model). No third-party operator at all. No Cloud Act exposure because no US company is involved.

The third question is the one where "sovereign AI" framing comes closest to making sense, because it is genuinely about jurisdictional sovereignty. But it is also the question with the largest gap between political claim and technical reality. Most "sovereign AI" offerings address questions 1 and 2 but punt on question 3.

A decision framework

For an architect making a deployment decision, the question is not "is this sovereign". It is: "for this specific workload, with this specific sensitivity level, what answers do I need to questions 1, 2, and 3, and which deployment configurations satisfy them?"

A reasonable framework, in order of escalating constraint:

Tier 0: Public, non-sensitive workloads

Marketing copy generation. Internal-document summarisation of non-confidential material. Developer-tooling LLM use against non-confidential code.

Answers needed: Q1 and Q2 should be in-region (EU for EU operations), via contractual commitment. Q3 is generally not a concern at this tier.

Acceptable deployments: Any major LLM vendor with an EU inference region and standard enterprise terms. OpenAI in EU, Anthropic via Bedrock EU, Mistral, Azure OpenAI in EU, Vertex AI in EU.

Tier 1: Confidential business data

Internal architecture documents. Internal financial planning data. Internal HR data (non-PII). Source code where commercial sensitivity is moderate.

Answers needed: Q1 in-region with contractual commitment. Q2 in-region with contractual commitment, including retention terms and incident-response data handling. Q3 should be considered: a US-headquartered hyperscaler running an EU region with strong contractual controls is acceptable; pure consumer-tier vendor endpoints are not.

Acceptable deployments: Azure OpenAI in EU on an enterprise contract with EU Data Boundary attestation. Anthropic via Bedrock EU on a similar contract. Mistral La Plateforme for workloads where the model quality is sufficient. Self-hosted Llama or Mistral on EU-resident infrastructure for workloads where data-residency control is paramount.

Tier 2: Regulated PII, financial customer data, healthcare records

Customer transaction data. Patient records. KYC documentation. Anything covered by financial-services or healthcare regulation.

Answers needed: Q1, Q2, and Q3 all need strong answers. Q3 is now actually the binding constraint: the legal exposure to US compelled access becomes a meaningful concern, depending on the specific regulator's view.

Acceptable deployments:

For most workloads, US hyperscaler in EU region with full EU Data Boundary controls is still acceptable, provided the company is comfortable with the residual Cloud Act exposure. Most large EU banks operate at this tier.
For workloads where Cloud Act exposure is unacceptable: an EU-incorporated cloud provider running an EU-developed model (Mistral on OVHcloud, for example), or self-hosted infrastructure with open-weights models.
The decision is partly about the workload's regulatory exposure and partly about the regulator's known posture on the question. The Dutch DNB and the German BaFin have been more conservative on US-hyperscaler exposure than some others. Calibrate to your primary regulator.

Tier 3: Defence, national-security-adjacent, certain forms of intelligence

Out of scope for most of the readers of this post. The pattern is on-premise, air-gapped, open-weights, with the entire inference path under the operating company's physical control.

Where the European model providers actually win

A fair assessment of where Mistral and the smaller European players genuinely deliver value rather than just political framing.

On data-residency questions where the customer wants to deal with an EU-incorporated counterparty rather than a US one. Mistral La Plateforme is operated by a French company. Its contract is governed by French law. There is no US holding company in the chain. For Tier 2 workloads where Cloud Act exposure is the specific concern, this is a real technical answer, not just positioning.

On model quality for non-frontier workloads. Mistral's smaller models (in 2026, the latest mid-tier Mistral models) are competitive with US mid-tier models on many evals. For workloads where you do not need frontier capability, the European model is a viable choice on capability alone, and the residency story is a bonus.

On open-weights availability for self-hosting. Mistral has been one of the more consistent providers of weights that can actually be deployed on customer infrastructure. For workloads where self-hosting is the architectural requirement, the open weights from Mistral and from Meta's Llama family are the practical choice.

On specific language strength. Some European models have strength in specific European languages (French, German, Spanish) that exceeds the major US models. For workloads where the working language is something other than English, this is occasionally a meaningful capability difference.

Where they don't matter

A fair assessment of where the European positioning is mostly marketing.

On frontier-class capability. As of mid-2026, the frontier of capability — the most demanding agentic reasoning, the most complex tool use, the most reliable code generation — sits with the major US frontier models. The gap has been narrowing on some benchmarks and not narrowing on others. If your workload genuinely requires frontier capability, the European alternatives are not yet substitutable.

On managed enterprise tooling around the model. The vendor ecosystem around the major US providers — observability tools, prompt-management platforms, evaluation frameworks, deployment patterns, integration platforms — is significantly more mature than the equivalent ecosystem around European providers. This matters for production deployments at scale.

On Q3 if the workload's regulatory exposure is low. For most Tier 0 and Tier 1 workloads, the Cloud Act concern is theoretical. Choosing a European provider specifically to address Q3 when Q3 is not actually a binding constraint is over-engineering.

A concrete recommendation

If I were the Chief Architect of a regulated European company today, making the call about AI infrastructure for the next two years, here is what I would actually do.

For Tier 0 workloads: Standardise on whichever frontier model the company has the strongest enterprise relationship with, deployed in an EU region with standard contractual residency commitments. Do not over-engineer the sovereignty question.

For Tier 1 workloads: Same as Tier 0, with stronger contractual commitments on data handling, retention, and incident response. The Bedrock or Vertex AI path with explicit region selection. Do not run consumer-tier endpoints.

For Tier 2 workloads with Cloud Act exposure as a binding constraint: Mistral via Mistral La Plateforme, or self-hosted open-weights model on an EU-incorporated cloud. Accept the capability trade-off; for these workloads, the regulatory clarity is worth more than the marginal model quality.

For Tier 2 workloads where Cloud Act exposure is acceptable to the regulator: US hyperscaler in EU region with strong contractual controls (EU Data Boundary, equivalent). Most of the EU regulated sector is operating at this tier today and it is workable.

For Tier 3 workloads: Out of scope for this piece. Talk to your sector-specific authority.

The thing I would not do is pick a "sovereign AI" product because the brand says sovereign. Read the contract, check the inference region, check the data-residency commitments, check the operating company's jurisdiction. Then make the call on the merits.

The next two years

A prediction. The "sovereign AI" framing peaks in 2026 and 2027, then transitions into a more technical conversation about data-residency engineering as the political moment passes and the real architecture decisions get made. The vendors that survive this transition are the ones whose technical claims hold up — not the ones with the best press release.

Specifically:

Mistral becomes a credible mid-tier European choice for regulated workloads. Its frontier-model ambitions either succeed or stall; either way, the mid-tier business is durable.
The US hyperscalers' EU Data Boundary arrangements get contractually stronger as their European customer base demands more. Microsoft, AWS, and Google all add layers of EU-control through 2026 and 2027. By 2028 these arrangements look much more like proper sovereign clouds for most practical purposes, although the Cloud Act exposure does not fully go away.
Open-weights models become the default for the most sensitive workloads. Self-hosting on EU infrastructure is the answer for the cases where Q3 is the binding constraint. The ecosystem around self-hosting (deployment patterns, observability, evaluation, fine-tuning) matures rapidly through 2026 and 2027.
The political framing fades. By 2028 "sovereign AI" is no longer the headline framing. The conversation is about specific technical commitments, just as the conversation about "cloud sovereignty" — which had its own moment in 2018-2020 — became technical rather than political over time.

The architects who do well in the next two years are the ones who stay grounded in the three technical questions and don't get distracted by the framing. The political conversation will resolve itself. The technical decisions will be on the architecture function's books long after.

If you are reading vendor proposals right now with "sovereign AI" in the title and you have not separately verified the answers to questions 1, 2, and 3 for the specific deployment configuration being proposed, please go back and do that before you sign.

This piece sits adjacent to:

How do you audit a decision an agent made? A working framework. — The audit story for agent decisions is closely related to the data-residency story; both are about where evidence lives and whose jurisdiction it lives under.
MCP is the most important enterprise standard nobody is implementing. — The integration layer through which inference happens; data residency cuts across this directly.

MCP is the most important enterprise standard nobody is implementing

2026-06-04T00:00:00.000Z

TL;DR

Anthropic introduced the Model Context Protocol in November 2024. By mid-2026 it has been adopted, in some form, by OpenAI, Google, the major IDE vendors (Cursor, Claude Code, VS Code via Copilot Chat), the major productivity vendors (Notion, Linear, Atlassian, Microsoft via M365 Copilot extensions), and a long tail of infrastructure providers. The protocol does one specific thing well: it standardises how an LLM gets access to external tools, context, and data, with a security and discovery model that generalises across vendors.

In consumer and developer-tooling contexts, MCP has won. In enterprise contexts — by which I mean regulated companies running agents in production over internal data — MCP adoption is poor. Most internal agent builds I see in 2026 are still using custom-per-vendor function calling, hand-rolled RAG pipelines, direct API integrations with each downstream system, or some proprietary middleware layer that the architecture function is trying not to think about.

This is a missed opportunity. MCP is the right abstraction. It is more secure than the bespoke alternatives most companies are running. It removes a category of vendor lock-in. It standardises the audit story for tool calls. And the cost of adopting it is low and falling.

This piece is what MCP is, why enterprise adoption has lagged, the arguments against and the answers to them, and three integration patterns I would use today for a regulated enterprise.

What MCP is, in one paragraph

Model Context Protocol is an open protocol — JSON-RPC over a few transport options, with a small set of standardised methods — that defines how an AI agent (the "client") connects to an external data or capability provider (the "server"). The server exposes a discoverable list of tools (callable functions), resources (retrievable content), and optionally prompts (reusable templates). The client connects, discovers what is available, and calls into the server during the model's reasoning loop. The protocol handles authentication, capability discovery, streaming, error handling, and — critically — the audit-relevant metadata around every call.

In effect, MCP is to AI tooling what LSP (the Language Server Protocol) was to IDE tooling. LSP let any editor talk to any language's tooling without each editor re-implementing the intelligence for every language. MCP lets any model talk to any data source or tool without each model vendor re-implementing the connector for every system.

Why this matters

The status quo without MCP is well-known to anyone who has built an agent against multiple downstream systems. A typical internal agent today reaches into Confluence for documentation, into Jira for tickets, into Salesforce for customer records, into the company data warehouse for analytics, into the CI system for build status, into an internal directory for people lookups, and into half a dozen smaller internal systems. Each of those integrations is either:

A bespoke OpenAPI-driven tool definition wrapped in a vendor-specific function-calling format (Anthropic's, OpenAI's, Google's — all similar but not identical).
A retrieval pipeline that ingests data from the source, embeds it, stores it in a vector database, and surfaces it through RAG.
A direct API call from the agent code, with the model's reasoning injecting parameters into the call.

Each of these is brittle in its own way. Bespoke tool definitions break when the agent moves to a different model vendor; you re-write the wrappers. RAG pipelines drift from the source data and require re-ingestion every time the source schema changes; they also lose the fine-grained permissions model of the source system. Direct API calls couple the agent code tightly to the downstream system's API surface; you re-write the integration every time the API changes.

MCP gives you a different shape. The downstream system exposes an MCP server. The agent, regardless of model vendor, talks to that server through the protocol. The server handles permissions, the discovery, the streaming, the structured responses. The agent code doesn't change when you switch models. The integration doesn't change when you switch agents. The downstream system's API can evolve independently of the agent layer.

There are several specific implementation wins:

Fine-grained permission delegation. MCP servers receive the end-user's identity (via OAuth-style flows where the server is the resource server and the model client is the bearer). The permissions the server applies are the permissions of the actual user. This is the correct security model for an enterprise tool; the alternative is service-account-with-superset-permissions, which is the source of half the data-leak incidents I have seen with internal agents.
Discoverability over hand-wired configuration. The agent discovers what tools and resources are available rather than having them baked into a config file. New capability becomes available without re-deploying the agent. Removed capability becomes unavailable without manual cleanup.
Standardised audit metadata. Every MCP call has a request ID, a timestamp, a tool name, a parameter set, a return value. Same structure regardless of which server you are calling. The audit framework I wrote about earlier becomes much easier to implement uniformly.
Transport portability. Stdio for local servers, HTTP+SSE for remote servers, WebSocket-style streaming where needed. The agent code doesn't care which.

Where adoption actually is

The disconnect between MCP's design quality and enterprise adoption is real and worth being specific about.

In consumer tooling, MCP has effectively won. Claude Desktop, Cursor, Claude Code, Continue, Windsurf, and a growing set of other editors all speak MCP natively. The ecosystem of public MCP servers is in the hundreds and growing: GitHub, GitLab, Linear, Notion, Slack, Sentry, every major SaaS vendor either has an official server or a community one.

In AI-first companies and developer-tooling startups, MCP is the default. The pattern is: build the product, expose an MCP server, let the customer's agent connect.

In regulated enterprise, the picture is different. The architecture functions I have talked to in financial services, healthcare, utilities, and government are in some combination of these states:

Aware of MCP but treating it as a "consumer thing".
Concerned about the security model and waiting for "enterprise MCP".
Running pilots but not in production.
Building bespoke wrappers that solve the same problem badly.

The mismatch is striking. The protocol is more rigorous than what most companies have built internally. The security model is better than service-account-everything. The audit story is cleaner. And yet the adoption curve is shallow in exactly the segment that would benefit most.

Why enterprise hasn't moved

The reasons I have actually heard, ranked roughly by how often I hear them:

"It feels too new"

The honest one. MCP is eighteen months old. Enterprise procurement cycles do not move at eighteen-month tempo. The CISO will not sign off on a protocol that doesn't yet have multiple cycles of production deployment in regulated industries behind it.

This is a reasonable concern but it has gotten weaker over time. By mid-2026, MCP has been deployed by enterprises including several large banks, healthcare providers, and at least one defence contractor I have seen reference for. The "too new" objection is becoming "too new for our particular sector, where we want to see two more reference customers first". That last layer of friction peels off through 2026 and 2027.

"It feels like vendor lock-in"

This one is wrong but understandable. The intuition is that MCP is Anthropic's protocol and adopting it ties you to Anthropic. The reality is the opposite: MCP is the thing that breaks the vendor lock-in that bespoke tool wrappers were creating.

If your agent is built against Anthropic's function-calling API specifically, you have an Anthropic-shaped integration that needs rewriting if you ever move to a different model. If your agent is built against MCP, the same MCP server works with any MCP-capable client, regardless of which model vendor is behind it. Adopting MCP is the un-locking move, not the locking move.

The protocol itself is open. The spec is on GitHub under an MIT-style licence. Anthropic, OpenAI, Google, and the major IDE vendors all participate in the spec. It is not Anthropic's proprietary thing.

"The security model is not enterprise-grade"

This is partially right and getting less right over time.

The MCP spec as initially shipped had some gaps from an enterprise-security perspective. Specifically, the auth flow was under-specified, the discovery layer didn't have a great answer for "how does the client know which servers are approved", and the per-tool permissions story was thin.

By mid-2026 most of these gaps have been addressed in the protocol itself or in canonical patterns around it. OAuth 2.1 with PKCE is now the standard auth flow for remote MCP servers. Server registries let the enterprise control which servers a given client is allowed to discover. Per-tool permission scoping is a standard pattern. The remaining concerns are real but smaller, and most of them are also concerns with the bespoke alternatives — they are just more visible with MCP because the protocol is explicit about what is happening.

"We don't trust the auth flow"

Specific version of the above. The worry is that a malicious or compromised MCP server could exfiltrate data by tricking the model into calling it.

This is a real risk class but the answer is the same as for any tool-calling architecture: server allowlist at the client level (the agent will only talk to MCP servers on its approved list), content security policies on tool returns (the model cannot exfil data through a tool that does not have a permitted destination), and prompt-injection mitigation at the model layer (which is an unsolved problem for tool calling generally, not specifically for MCP).

The protocol does not solve prompt injection. Nothing does, yet. But MCP does not make it worse than the bespoke alternatives, and in some specific ways it makes it better — the structured nature of MCP calls gives you more places to enforce policy than free-form agent-API calls do.

"We can build it ourselves"

The most expensive of the objections. The architecture function or the AI platform team has built a bespoke internal protocol that looks a lot like MCP but predates it, or that the team finds easier to reason about, or that integrates better with the company's existing identity infrastructure.

The cost of this is the same as any custom-protocol cost: every new tool integration, every new model client, every new vendor relationship goes through the custom layer. The team is now maintaining a small specification, a client library, a server SDK, and the integrations between them. None of that produces business value. All of it is reinvention.

The right move for a team in this position is usually a graceful migration: keep the existing protocol for the existing integrations, adopt MCP for new ones, deprecate the custom layer over a year. The cost of running two protocols for a year is much smaller than the cost of running a custom protocol forever.

Three integration patterns for regulated enterprises

The framework for adopting MCP at an enterprise scale comes down to three patterns, depending on what is being integrated. These are the patterns I would use if I were running this build today.

Pattern 1: First-party MCP servers for internal systems

For internal systems that the architecture function controls or has influence over — the EA platform, the application portfolio, the capability model, the PMO data, internal documentation — build a first-party MCP server that exposes those systems through the protocol.

Why first-party: the team that knows the data model is the right team to define what gets exposed. The permissions model can be faithful to the source system's permissions. The tool definitions can be precise rather than approximate. Audit logging can be co-located with the system itself.

What this looks like in practice: an MCP server that runs alongside your internal application, exposing a small number of tools (typically five to twenty per system, not hundreds), each one mapped to a specific use case. Not every API endpoint becomes an MCP tool — that produces an unusable surface area for the model. Curate.

The internal-systems-first pattern is exactly what I would do for the Meridian platform: expose an MCP server that lets any other internal agent query the application portfolio, the capability model, and the CANVAS workflow records, with permissions enforced by the same identity gateway the application already uses.

Pattern 2: Vendor-provided MCP servers for SaaS systems

For SaaS systems your company already uses (Salesforce, Atlassian, Linear, Slack, your data warehouse, your CI system), use the vendor's official MCP server if they have one. Most major SaaS vendors do by mid-2026; the rest will within a year.

Why vendor-provided: the vendor maintains the integration, including keeping it in sync with their own API evolution. You inherit their permissions model, their rate limiting, their authentication. You do not write integration code.

What to validate before deploying: the vendor's MCP server runs in a region consistent with your data residency requirements, the auth flow is OAuth 2.1 with PKCE (or stronger), the tool returns can be constrained to specific scopes, the audit log is exportable to your own infrastructure. Same diligence as for any third-party SaaS integration.

Where the vendor doesn't have an MCP server: assess whether the community has built one with quality you can rely on (the GitHub MCP servers list is the de-facto registry), and if not, either build your own thin wrapper around the vendor's API or wait. Do not adopt low-quality community servers into a regulated production environment without your own audit pass.

Pattern 3: A central MCP gateway

For the architecture function's overall posture, run an internal MCP gateway that sits between agents and downstream servers. The gateway provides:

A registry of approved MCP servers. Agents discover capability through the gateway, not through ad-hoc server lists.
Auth proxying. The gateway holds the OAuth tokens for downstream servers and exchanges them for short-lived credentials per agent call. The agent never holds long-lived tokens.
Audit logging. Every call through the gateway is logged centrally, in the company's SIEM. This is the implementation point for the audit framework.
Policy enforcement. The gateway can block tool calls that violate policy (e.g., a tool that would exfiltrate PII to an external destination, a tool from an un-approved server, a call with parameters outside an allowed range).

This gateway pattern is the missing piece in most enterprise MCP deployments. It is also the piece that turns MCP from "a protocol the developers use" into "a governed enterprise capability". Without it, every agent makes its own decisions about which servers to talk to and how. With it, the architecture function has a single control point.

The gateway is build-yourself work today. By 2027 there will be commercial gateway products. By 2028 the gateway will be a standard piece of the AI platform stack alongside the LLM proxy and the prompt registry.

What to implement first

If the architecture function is starting MCP adoption from zero today, the order of operations:

Month one: build one first-party MCP server for an internal system you control. Pick the system with the highest agent-query volume in your existing setup. Expose it via MCP. Wire up one agent (Claude Code in the development environment is the easiest first client). Validate the auth flow, the audit logging, and the permission delegation. This is the proof-of-life.
Months two and three: deploy the central gateway. Build it yourself if there is nothing on the market that fits your requirements. Migrate the first-party server from month one behind the gateway. Migrate one or two vendor MCP servers (Atlassian, GitHub, your CI system) behind the gateway.
Months four to six: standardise on MCP for new integrations. Any new agent-to-system integration is required to go through MCP. Architectural exception process for cases where it is not yet possible. The architectural drift cost of this constraint is small; the long-term simplification is significant.
Month six onwards: migrate the legacy. Deprecate the existing bespoke integration layer. Pick the top three highest-traffic custom integrations; rewrite them as MCP servers. Stop maintaining the old layer.

By the end of the year you have an MCP-first agent platform. The new integration cost has dropped. The vendor-lock-in surface has shrunk. The audit posture is uniform across agents and tools. The architecture function has a clean abstraction to reason about.

The honest limitations

To be balanced about it: MCP is not a complete answer to every agent-integration problem.

The protocol does not solve prompt injection. An MCP-mediated tool call is as exposed to malicious prompts as a direct API call. The defences are at the model layer (prompt-injection-resistant system prompts, output validation, sandboxing of high-risk tools) and at the gateway layer (policy enforcement on tool calls), not at the protocol layer.

The protocol does not specify a great answer for long-running operations. A tool call that takes minutes (a complex database query, a large analytics job, an external system that's slow to respond) is awkward in MCP today. There are extension patterns emerging (async tool calls with callbacks, polling endpoints, job handles) but they are not universal. For workloads that require long-running operations, you will need an extension or a workaround.

The protocol does not specify a great answer for bidirectional streaming inside a tool call. Pure unidirectional streaming (server to client) is well-supported. Mid-call user prompts, agent clarifications, or interactive flows inside a tool call are not fully standard.

The vendor-implementation maturity varies. Some MCP servers are excellent (the major vendor-provided ones tend to be good). Some are early-stage and not production-quality. The "is this server production-ready" assessment is not standardised; you have to do your own diligence.

These limitations are real. None of them is a reason to ignore the protocol. All of them are addressable in the integration pattern you choose.

Where this is going

A short prediction. By the end of 2027:

MCP is the default protocol for new agent-to-system integrations in the enterprise. The number of teams writing bespoke function-calling wrappers will be small and shrinking.
A handful of commercial MCP gateway products will exist, with meaningful market share. The build-yourself gateway becomes a niche choice rather than the only choice.
Most major SaaS vendors will have an official MCP server. The ones that don't will be at a competitive disadvantage in AI-augmented workflows.
The protocol itself will have stabilised. The 1.x to 2.x transition will have happened with reasonable backward compatibility. The specification will look more like LSP does today — boring, mature, reliable.

The companies that adopt MCP in 2026 will benefit from being ahead of the curve when the gateway market matures and the SaaS ecosystem fills out. The companies that wait until 2028 will be re-architecting then.

If you are running an agent program in a regulated company right now and you do not have an MCP strategy, you have a strategy gap that is going to cost you. The pieces are there to adopt today. The adoption cost is low and falling. The integration surface area you are currently building bespoke is going to be the thing you regret most when MCP becomes the default. Move now.

Cursor in a regulated industry: the actual policy you need

2026-05-28T00:00:00.000Z

TL;DR

There are now four serious AI coding tools in widespread enterprise use: Cursor, Claude Code, GitHub Copilot, and Windsurf. Every regulated company I have talked to is in one of two states with respect to them. Either they have officially banned them (developers use them anyway, on personal devices, with company code) or they have rubber-stamped them under an "AI policy" written by someone who has not actually used the tools. Both are worse than doing nothing, because both produce the illusion of governance without the substance.

This piece is the policy I would actually write today for a regulated enterprise — a financial services firm, a healthcare provider, a utility, a defence contractor. Six policy areas, each one specific enough to enforce: data residency, prompt and code logging, code-review attribution, intellectual property, secrets handling, and third-party dependency exposure. For each area, the question, the wrong answer, the right answer, and the enforcement mechanism. Plus vendor-specific configurations for the four major products and a note on what architecture owns versus what security owns.

If you are about to publish your company's "AI coding policy" this quarter, read this first.

Why "ban it" and "approve it" both fail

The ban fails because developers use the tools anyway. They use them on personal laptops, on personal accounts, with snippets of company code pasted across. The code that gets pasted out is often the code that is most in need of help — the stuck-debugging code, the authentication module that has gone sideways, the SQL query the team cannot get to perform. Exactly the wrong code to leak. The ban gives the company plausible deniability while leaving every actual security concern in place.

The rubber-stamp fails because it produces a policy that nobody can actually comply with, because the people writing it have not used the tools. The policy will say things like "developers must not paste sensitive code into AI tools" — which is technically true and operationally meaningless, because the whole point of the tool is to read your code. It will say "AI-generated code must be reviewed for quality" — true of all code; how does AI change the review standard? It will say "do not use AI tools for production code" — fine, but then what is the boundary between non-production and production in a continuous-delivery shop where every commit might end up in production within hours?

The result, in both cases, is that the actual governance question goes unanswered. The actual governance question is: what is the specific set of conditions under which an AI coding tool can read, write, or influence company code, in a way that is auditable, that respects regulatory and contractual obligations, and that does not require asking individual developers to make impossible judgment calls?

That is what a policy needs to answer. Here are the six clauses.

Clause 1: data residency

The question. Where does the model that is reading or writing my code physically run? Where does the data my code touches get sent during inference? Whose jurisdiction has access to it under what legal process?

The wrong answer. "The AI vendor says they are GDPR-compliant." This is necessary but nowhere near sufficient. GDPR-compliance is about handling personal data correctly. It does not say anything about where the data is processed, which is the part that matters under FCA outsourcing rules, under the EU AI Act's high-risk provisions, under DORA's third-party risk requirements, and under sector-specific rules for healthcare, defence, and critical infrastructure.

The right answer. A specific residency requirement, stated as a hard constraint. For an EU-headquartered company in financial services, that looks like: "model inference must occur in an EU member state. The model provider must be able to provide written attestation of the inference region for any given request, on demand. If the inference is routed through a cloud hyperscaler, the cloud region must be EU. If the underlying inference cannot be confirmed to occur within the EU, the tool is not approved for use against company code."

For a US-headquartered company in healthcare with HIPAA exposure, the constraint is different but the structure is the same: a hard region constraint with an attestation requirement.

The enforcement mechanism. Procurement contracts include the specific region clause. The architecture function reviews the vendor's inference architecture annually. The security team has a quarterly check that the configuration in use matches the contract.

Practical complication. Some of the tools (specifically Claude Code and Copilot Enterprise) can route through a customer-owned cloud endpoint — Amazon Bedrock, Azure OpenAI, Google Vertex AI. This is the enterprise-grade answer. The tool runs against an inference endpoint inside your own cloud account, in a region you control, with logging you own. If the tool supports this and you are in a regulated industry, this is the configuration to use. Do not use the vendor's default consumer endpoint.

Clause 2: prompt and code logging

The question. What record exists, after the fact, of what was sent to the AI tool and what was returned? Where is that record stored? Who can access it?

The wrong answer. Either "we don't log it, to protect developer privacy" or "the vendor logs it for 30 days, we trust them". Neither survives a serious audit. The first produces no evidence at all. The second produces evidence that lives outside your control, with a retention schedule the vendor sets, in a system you cannot query.

The right answer. Customer-owned logging of every AI tool interaction, stored in the company's own SIEM or equivalent log infrastructure, with retention matching the company's broader log retention policy (typically two to seven years in regulated industries). The log captures: the prompt, the response, the model identifier, the timestamp, the user identity, the project context, the IDE session ID. Same audit-grade discipline as for any other sensitive system.

The enforcement mechanism. Tool configuration that routes logging to the company's infrastructure, not the vendor's. Periodic audit that the configuration is in place. Sample queries against the log to confirm prompts are actually being captured. Disconnect from the tool any developer whose IDE session is not logging properly.

Practical complication. Not all tools support customer-owned logging at the granularity you need. Cursor's enterprise tier supports prompt-level logging to a customer-owned destination. Claude Code can be configured to log through the Anthropic API with customer-managed logging if you proxy through your own infrastructure. Copilot Enterprise supports audit log export. Windsurf is more limited. The tool selection question and the logging question are linked: pick a tool that supports the logging discipline your audit requires.

This is essentially the same instrumentation discipline I described in how to audit a decision an agent made, applied to a developer-facing tool.

Clause 3: code-review attribution

The question. When AI generates or substantially modifies a piece of code that lands in your repository, who is recorded as the author, and how is that disclosed in code review?

The wrong answer. Either "treat AI-generated code as if a human wrote it" or "require a separate label on every line of AI-touched code".

The first is the path most teams default to. It produces unaccountable code. A bug six months later cannot be traced back to the prompt that produced it. A regulator asking "who wrote this" gets a developer name and no record of the AI involvement.

The second is technically pure and practically unworkable. Modern AI coding tools produce code in a continuous loop with the developer. Pretending you can label every AI-influenced character is silly. The label becomes either pervasive (every file is labelled AI-touched, which conveys no information) or selective (the developer decides what to label, which is exactly the judgment call the policy was meant to remove).

The right answer. Two separate records, kept distinct.

First, a commit-level co-author attribution: every commit that incorporates significant AI assistance is marked with a Co-Authored-By trailer naming the AI tool and model version. This is the lightweight, git-native disclosure. It does not claim to label every line; it claims to label the commit as one where AI was substantially involved. The threshold for "substantial" is a team norm, not a policy clause — typically, "more than a single autocomplete suggestion".

Second, an out-of-band session log: the prompt-and-response log from Clause 2 captures the full record. The git commit links back to the relevant session via a session ID in the commit message. The git history shows what was committed; the session log shows how it got there.

The enforcement mechanism. Pre-commit hook that prompts the developer if AI assistance was used and adds the Co-Authored-By trailer if so. CI check that any commit marked as AI-assisted has a corresponding session ID. Code review checklist item: "is the session log linked, if AI was used".

Practical complication. Developers will not voluntarily mark every AI-assisted commit. The pre-commit hook can default-on if the IDE indicates AI activity. Cursor, Claude Code, and Copilot all expose enough telemetry to the local environment that this is detectable. Pure mandatory self-disclosure does not work; auto-detection with a manual override does.

Clause 4: intellectual property

The question. Who owns the code that an AI tool produces? Under what licence? With what indemnity if the output reproduces something covered by a third-party copyright?

The wrong answer. "We accept the AI vendor's standard terms." The standard terms vary widely between vendors and most of them shift more risk to the customer than a careful read would suggest. Some vendors offer indemnity for output reproducing copyrighted training material; some don't. Some retain the right to train on your code; some don't. Some grant a perpetual licence to all code produced through their tool; some don't.

The right answer. A negotiated enterprise contract that explicitly covers four things:

IP ownership of outputs. Customer owns all code produced through the tool against customer code. No exceptions.
No training on customer code. Vendor agrees not to use customer code (prompts or outputs) to train future models.
Indemnity for output infringement. If the tool's output reproduces copyrighted material that the customer subsequently ships and faces a claim on, the vendor indemnifies. Most major vendors now offer this (Copilot, Cursor, Claude Code, all have some form of it on enterprise tiers). Read the actual cap; the indemnity is often dollar-limited.
Data-handling terms that match the company's standard data processing agreement. If the vendor cannot meet the company's standard DPA, that is itself a signal.

The enforcement mechanism. Legal review of the contract before deployment. Annual recertification. If the vendor terms change materially, re-review.

Practical complication. Developers using personal accounts against company code is the IP-leakage attack vector that the policy needs to close. Even with the right contract for enterprise licences, an individual developer using a free tier with company code is operating outside the negotiated terms. Tool access has to be SSO-enforced and personal accounts have to be blocked at the network layer or the device-management layer.

Clause 5: secrets handling

The question. What stops a developer from accidentally pasting a production API key, a database password, or a customer record into an AI tool?

The wrong answer. "We tell developers not to do this." The training-and-awareness approach has a well-documented track record of not working. Developers paste secrets into Stack Overflow. They paste secrets into bug-tracker tickets. They paste secrets into AI tools.

The right answer. A pre-flight scrubbing layer that intercepts prompts before they leave the developer's machine. Specifically:

A local-machine prompt-scanning hook integrated with the IDE. Scans for high-entropy strings, known credential formats (AWS keys, Azure connection strings, JWT tokens, OpenAI keys, etc.), PII patterns (NHS numbers, NI numbers, credit card numbers).
If a secret is detected, the prompt is blocked from being sent. The developer sees a warning explaining what was caught.
The blocked event is logged. The same audit framework as for successful prompts.

This is not a perfect defence — context-dependent secrets (an internal hostname, a customer's company name) are not scannable — but it eliminates the catastrophic-and-common case of an actual API key going to a vendor.

The enforcement mechanism. Mandatory installation of the scrubbing hook on every developer machine running an approved AI tool. Periodic check that it is running. Alerts on bypass attempts.

Practical complication. The scrubbing layer adds latency to every prompt. Developers will work around it if the latency is bad. Tune for sub-100ms scrubbing time. Most current scanners can hit this if the regex library is reasonable.

Clause 6: third-party dependency exposure

The question. When the AI tool suggests using a third-party library, what stops it from suggesting a malicious or vulnerable one?

The wrong answer. "The developer will check." Developers do not check. Developers check less when the suggestion looks confident. A library suggestion that comes wrapped in a fluent explanation of why it is the right choice gets less scrutiny than a library suggestion they found themselves.

The right answer. The same software supply chain controls that should already exist, with the AI tool's suggestions explicitly in scope:

An allowlist or denylist of permitted package sources. The AI tool cannot suggest a library not on the allowlist; if it does, the suggestion is blocked at IDE level.
A vulnerability scanner that runs on every dependency added, whether suggested by AI or by a human. CVE thresholds match the company's broader vulnerability policy.
A typosquatting check: a library name that is very close to but not exactly a popular package name is flagged. This is the attack vector where AI tools have been most often documented producing vulnerable suggestions.
A "hallucinated package" check: if the AI suggests a library that does not exist in the company's package registry mirror, the suggestion is blocked. Hallucinated packages have been an emerging vector for supply-chain attacks specifically because they pre-create the demand that an attacker can then satisfy with a malicious package of the same name.

The enforcement mechanism. The package allowlist is maintained by the security team and consumed by the IDE plugin. The scanner is part of the CI pipeline. The typosquatting check is in the IDE plugin.

Practical complication. New legitimate libraries are added to ecosystems daily. The allowlist needs an expedited approval path or it will be ignored. Plan for that.

Vendor-specific configurations

The six clauses above are tool-agnostic. The configurations to implement them vary. A quick reference for the four major tools as of mid-2026:

Cursor

Clause	Configuration
Residency	Cursor Business / Enterprise plans support routing inference to customer-owned LLM endpoints (Bedrock, Azure OpenAI, Vertex AI). Use that. The default consumer endpoint routes through the vendor's infrastructure with less control.
Logging	Enterprise tier supports logging to a customer-owned destination. The logs include the prompt, the response, the file context, and the user identity. Confirm during contract that this can be exported to your SIEM.
IP terms	Enterprise contract includes indemnity for output infringement and no-training-on-customer-data terms. Free tier does not. Block free tier at the SSO layer.
SSO	Cursor supports SCIM provisioning and SAML SSO on enterprise. Required.

Claude Code

Clause	Configuration
Residency	Claude Code can run against the Anthropic API directly or through Amazon Bedrock or Google Vertex AI. The Bedrock and Vertex options give you the customer-owned inference region. Use those for regulated workloads.
Logging	Anthropic offers enterprise audit logging. Bedrock and Vertex give you CloudTrail-equivalent logging. Both are workable; the Bedrock/Vertex path is more aligned with existing enterprise log discipline.
IP terms	Anthropic enterprise contract offers output indemnity and no-training commitment. Read the indemnity cap; it is non-trivial but bounded.
SSO	Required on enterprise tier.

GitHub Copilot Enterprise

Clause	Configuration
Residency	Microsoft routes Copilot inference through Azure OpenAI infrastructure. The customer can request a specific region. EU customers should specify an EU region in the contract.
Logging	Copilot Enterprise has audit log export. The grain is per-suggestion rather than per-prompt; the model context the suggestion was based on is captured. Sufficient for most audit purposes.
IP terms	Microsoft indemnity is the most generous in the market and the most extensively litigated. Read the carve-outs (notably for "duplicate detection turned off"). Leave duplicate detection on.
SSO	Enterprise tier requires GitHub Enterprise. Required.

Windsurf

Clause	Configuration
Residency	Less mature in this area as of mid-2026. Limited customer-owned-endpoint options. For regulated workloads, treat as conditional approval at best.
Logging	Limited enterprise logging options.
IP terms	Newer to enterprise contracting; terms are less standardised.
SSO	Available on enterprise tier.

The pattern: for serious regulated workloads, Cursor (with a customer-owned LLM endpoint), Claude Code (via Bedrock or Vertex), or Copilot Enterprise are the workable choices. Windsurf is fine for non-regulated workloads but does not yet have the enterprise controls the other three have.

What architecture owns vs what security owns

A short note on the politics of this, because every regulated company I have worked with has the same conversation about who owns the policy.

Security owns the enforcement layer: the SSO configuration, the network blocks on personal accounts, the prompt-scrubbing hooks, the vulnerability scanners, the SIEM integration. They are the operational owner of "is the policy actually being followed".

Architecture owns the policy itself: what the clauses are, what tools are approved, what configurations are required, what trade-offs are acceptable. They are the technical authority on "what should the policy say".

Legal and procurement own the contract: the IP terms, the indemnity, the data-handling clauses, the residency commitments.

The Chief Risk Officer or equivalent owns the residual risk acceptance: signs off that the policy as written is consistent with the company's risk appetite.

When this is unclear, the policy drifts. Security writes a policy that is too restrictive because they cannot model the development workflow. Architecture writes a policy that is too permissive because they cannot model the residual risk. Legal writes a policy that is unimplementable because they cannot model the tool's actual capability. The four functions need to be in the same room when the policy is written.

A transition plan

If your company is currently in one of the two failure states (ban or rubber-stamp), here is how to get to a working policy in roughly ninety days.

Days 1–14: Inventory the actual usage. Survey developers anonymously about which AI tools they are using, on which devices, with what data. Most companies are shocked by the results. The right baseline is honesty about what is already happening, not what the policy nominally allows.

Days 14–30: Draft the six clauses against your context. Use the framework above as a starting point. Specifics vary by industry, by jurisdiction, by sensitivity of code base. Convene architecture, security, legal, procurement, and risk in the same room. Write the policy in language your developers can understand.

Days 30–60: Negotiate the enterprise contract for one tool. Pick one tool to standardise on first. Multiple tools is fine later; one is enough to begin. Negotiate the enterprise contract to match the clauses. Be willing to walk away from a vendor that will not meet the data residency or logging requirements.

Days 60–75: Deploy the enforcement layer. SSO configuration. Network blocks on free tiers. Pre-commit hooks. Prompt scrubbing. SIEM integration. The plumbing.

Days 75–90: Roll out, observe, iterate. Phased deployment to one engineering team, then the rest. The first team is the canary; observe what breaks. The policy will need adjustment in the first month. Plan for that.

After ninety days you will have a working policy, an enforcement layer, an audit trail, and a defensible position when the next regulatory cycle starts.

What this is, and what it is not

This is a policy framework. It is not a security strategy. It is not an AI strategy. It is a specific governance layer aimed at one specific question: how does this company use AI coding tools in a way that is auditable, contractually sound, and consistent with the regulatory environment.

It assumes a regulated context. If you are running a B2B SaaS with no regulated customer base, several of these clauses are overkill. If you are running a defence contractor or a systemically important financial institution, several of these clauses are not strict enough. Calibrate to context.

It also assumes the tools will get better. They will. The policy needs to be revisable. Quarterly review of the approved-tools list, annual review of the contract terms, periodic spot-checks on the enforcement layer. Treat it as a living document.

The point is not to slow down AI adoption. The point is the opposite: a working policy lets a company adopt AI coding tools at scale with the regulatory exposure controlled. Companies without a working policy slow down anyway, because every team makes the governance decision individually and badly. A working policy removes the ambiguity and lets the development organisation actually get on with it.

If you are about to publish your "AI policy" this quarter and have not yet written the six clauses with the specificity above, push the publish date.

How do you audit a decision an agent made? A working framework

2026-05-21T00:00:00.000Z

TL;DR

The single hardest unsolved problem with deploying AI agents into regulated enterprises is not capability, latency, hallucination, or cost. It is auditability. When General Counsel, the Chief Compliance Officer, or a regulator asks "show me, in full, what this system told my employee on March 12 at 14:32, what data it looked at when it produced that answer, and what action was taken as a result", most agentic systems in production today cannot answer the question. This is a design failure, not an inevitable one.

The framework that follows treats audit as four distinct layers that must each be captured separately and verifiably: the request (what was asked), the context (what the model was given), the generation (what the model produced), and the action (what the system did with the output). Each layer has a specific data model, a specific storage discipline, and a specific failure mode. This is the pattern I built into Meridian and into CANVAS, and it is the pattern I would carry into any agent deployment in a regulated environment.

If you are running an agent in production right now and any of the four layers is missing, your audit story does not actually work. This piece walks through the implementation.

Why most AI governance frameworks don't survive contact

There are now several reasonable-quality AI governance frameworks in the public domain: the NIST AI RMF, the EU AI Act compliance guidance, the various sector-specific overlays (FCA's discussion papers on AI in financial services, the FDA's draft guidance on AI/ML medical devices, the Bank of England's supervisory statements, and the equivalents in other jurisdictions). They are useful. They are also, mostly, written at the level of policies, principles, and intended outcomes — not at the level of the data structures and code paths that determine whether the policy is actually implementable.

This produces a familiar pattern. The Risk function publishes a sound-looking AI policy. Architecture nods along. Engineering ships the agent. Six months later, the first proper audit happens. The auditor asks for the records that the policy implies should exist. The records don't exist, or they exist in five different systems, or they exist but cannot be linked together because the system that called the LLM didn't log the trace ID that the system that took the action recorded.

Audit is not a policy problem. Audit is an instrumentation problem. Instrumentation has to be designed in. Retrofitting it is expensive and produces a worse result.

This piece is the instrumentation framework I would build into any agentic system that needs to survive regulatory scrutiny.

The four layers

Every agent decision sits on top of four distinct artefacts that must be captured separately:

        ┌─────────────┐
        │   REQUEST   │ ← what the user asked, in what context, with what permissions
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │   CONTEXT   │ ← what the model was given to work with (retrieval, tools, system prompt)
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │  GENERATION │ ← what the model produced (raw output, structured parse, confidence)
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │    ACTION   │ ← what the system did as a result (writes, side effects, downstream calls)
        └─────────────┘

The auditor's question — "what happened on March 12" — is actually four sub-questions:

What was the user trying to do? (request layer)
What information did the model see when it decided? (context layer)
What did the model actually say? (generation layer)
What did the system then do? (action layer)

If any of those four cannot be answered with high fidelity and linked back to the others through a stable identifier, your audit is broken. The instrumentation discipline is to instrument each layer separately, capture it deterministically, and tie them together with a trace ID that propagates end to end.

The rest of this piece walks through each layer in turn.

Layer 1: the request

What needs to be captured:

Trace ID. A UUID generated at the entry point of the request, propagated through every downstream call. This is the spine of the whole audit record. Without it, you can capture every layer perfectly and still not be able to link them.
Actor identity. The authenticated user, including the identity-provider claims that were validated at the gateway. Not just "user X" but "user X, authenticated via OIDC against IdP Y, with claims {department: Z, role: W}, at 14:32:07 UTC".
The literal request. Whatever the user actually typed, asked, or submitted. Stored verbatim. Not summarised, not cleaned, not sanitised. If the user pasted an SSN into the chat by accident, you want to know that — both because you may need to scrub it downstream and because the question of "did the system handle a PII-bearing prompt" is itself an auditable event.
Request context that the system used. Was this an authenticated API call, a chat session, a scheduled job? Was the user inside the company network, or remote? Which tenant, if you are multi-tenant?
Wall-clock timestamp. UTC, to the millisecond. Plus the system's own monotonic clock if you have one. Wall clocks drift; monotonic clocks don't.
Permissions snapshot. The set of permissions the user held at the moment of the request. Not "the user's current permissions" — permissions change — but the snapshot that was used to authorise the call. This is the protection against the "the user used to have access to that data" defence.

The request layer is the easiest of the four to capture well, and the most commonly captured badly. The two failure modes I see most often:

Trace ID is generated downstream, not at the gateway. This means a system-internal failure (the LLM call timing out, a retry, a fallback path) produces a different trace ID than the original request. The audit log shows two trace IDs for what is, from the user's perspective, one event. Always generate the trace ID at the outermost entry point and propagate it.
The user's literal input is paraphrased or stripped of metadata before logging. Often done with good intent — to remove PII or to compress the log. Bad practice. Capture the original; redact in views, not in storage. Storage redaction is a one-way operation that destroys evidence.

Layer 2: the context

What needs to be captured:

The system prompt, in full. Not just a reference to "system prompt v3" — the actual text that was sent to the model. System prompts change. Prompt-caching layers can in theory be replayed from a cache key, but in practice you want the full text in the audit record so the audit doesn't depend on the cache key still resolving in three years' time.
The retrieved context. Whatever the RAG layer pulled in. Specifically: which documents were retrieved, with what IDs at what versions, in what order, with what similarity scores, and what the actual content of each retrieved chunk was. The chunk content matters because retrieved data can change underneath you — a document gets updated, a record gets soft-deleted, an embedding index is rebuilt. The audit record needs the data as the model saw it, not the data as it exists now.
Tool definitions, if the model was given tools. The schema of every tool the model could have called. Tools change too. The set of tools available to the agent on March 12 may not be the set available today.
Conversation history, if this was a multi-turn interaction. Captured turn by turn, with trace IDs linking back to earlier requests so the full thread can be reconstructed.
Model identifier, including the exact version. "Claude" is not enough. "claude-opus-4-7" is enough. Model versions change. Behaviour changes with them. The audit record needs to know which version made the call.
Sampling parameters. Temperature, top_p, top_k, max_tokens, any stop sequences, any structured-output schemas. Determinism isn't possible with most LLMs, but the parameters that influence the distribution of outputs are part of the audit story.

The two failure modes I see most often at this layer:

The retrieved context is referenced but not stored. The audit log says "retrieved 3 documents, IDs 47, 92, 318" but doesn't include the content of those documents at the time of retrieval. Then the documents change. The audit record is now ambiguous — you cannot tell whether the model's response was reasonable given what it actually saw.
The system prompt is stored as a reference, not as text. The audit log says "system prompt: meridian.v3", and meridian.v3 is a pointer to a config file that has since been updated. The audit is unreplayable. Always inline the system prompt text.

Layer 3: the generation

What needs to be captured:

The raw model output, verbatim. Whatever bytes came back from the model. No formatting, no cleaning, no post-processing applied yet.
The structured parse, if the system extracted structured data from the output. Both the parsed structure and any validation errors that occurred during parsing.
Tool calls made by the model, if applicable. Which tools the model called, with what arguments, in what order, and what each tool returned. Tool calls produce their own sub-audit records, linked by the trace ID and a sequence number.
Latency. How long the model took. Not because latency is inherently auditable, but because a model call that took 30 seconds when it normally takes 3 is a signal that something was unusual about that particular generation.
Cost, if you are tracking it. Input tokens, output tokens, cache reads, cache writes. The economic record is part of the audit record because cost is often the first place anomalies show up.

The failure modes here are mostly about post-processing:

The system stores the cleaned output instead of the raw output. Markdown got rendered to HTML before logging. Citation markers got stripped. The output that an LLM was actually told to produce is no longer present in the audit record, only the version that the rendering layer produced. Always log raw first, render later.
Tool calls are logged as completed actions, not as model decisions. The audit log shows "the system updated record 42", not "the model decided to call updateRecord(42) and the tool succeeded". For agent audit, the decision is the audit-relevant event, not just the outcome.

Layer 4: the action

What needs to be captured:

What the system did with the output. Did it write to a database? Send an email? Update a workflow stage? Call an external API? Each of these is an auditable event in its own right and needs to be captured with the same discipline as the LLM call.
The before-and-after state, for any write. The audit_log table in CANVAS uses a JSONB column for before_state and another for after_state. The diff between them is the auditable change.
The human-in-the-loop record, if there was one. Did a person review and approve the model's suggested action before it executed? If yes, capture who, when, and what they were shown. If no — if the action was fully automated — capture that fact explicitly. "Auto-executed" is a critical audit datum.
The downstream effects, if any. If the action triggered notifications, scheduled jobs, or further agent calls, those effects are part of the audit chain. Trace ID continues to propagate.

The action layer is where most agentic systems either accept genuine auditability or fail to. The failure modes are subtle:

Actions are logged in the application database but not linked to the trace ID. The application says "record 42 was updated at 14:32 by automation". The audit log says "trace ID abc made a model call at 14:32". Without a stable link between the two, you cannot prove which model call caused which database update.
The human-in-the-loop step exists but is not recorded as part of the agent decision chain. There is a separate approval system that records human sign-offs, but it does not store the trace ID of the model call that produced the suggestion. So "the human approved this" exists in one log; "the model suggested this" exists in another; nothing links them.

The append-only audit table

The architectural pattern that holds all four layers together is an append-only audit table. Strictly: never updated, never deleted. Insert-only privileges on the application database user. Indexed heavily on trace_id, actor_id, occurred_at, and entity_id.

A minimum-viable schema (PostgreSQL, but the structure is portable):

CREATE TABLE audit_log (
  id              UUID PRIMARY KEY,
  trace_id        UUID NOT NULL,
  layer           VARCHAR(20) NOT NULL,    -- REQUEST | CONTEXT | GENERATION | ACTION
  occurred_at     TIMESTAMPTZ NOT NULL,
  actor_id        UUID,                    -- nullable for SYSTEM actions
  actor_type      VARCHAR(20) NOT NULL,    -- USER | SYSTEM | MODEL
  action          VARCHAR(255) NOT NULL,
  entity_type     VARCHAR(100),
  entity_id       UUID,
  payload         JSONB NOT NULL,          -- layer-specific content
  ip_address      INET,
  user_agent      TEXT
);

CREATE INDEX idx_audit_trace ON audit_log (trace_id, occurred_at);
CREATE INDEX idx_audit_actor ON audit_log (actor_id, occurred_at);
CREATE INDEX idx_audit_entity ON audit_log (entity_type, entity_id);

The payload column is JSONB because each layer has a different shape. Use a discriminated union in your application code:

Layer	Payload schema
REQUEST	`{ request_text, request_context, permissions_snapshot, idp_claims }`
CONTEXT	`{ system_prompt, retrieved_chunks: [...], tools: [...], model: "claude-opus-4-7", parameters: {...} }`
GENERATION	`{ raw_output, parsed_structure, tool_calls: [...], latency_ms, input_tokens, output_tokens }`
ACTION	`{ action_type, target, before_state, after_state, automated: bool, approved_by_id?, approval_trace_id? }`

The discipline is that every layer for every request produces at least one audit row, and every row carries the trace_id that threads them together.

The eval question, which is also an audit question

Evals are usually framed as a quality concern. They are also an audit concern, and the audit framing changes how you design them.

The standard eval setup runs a test suite against the model on a schedule and produces a quality score. The audit framing asks a different question: when the regulator asks "how do you know your agent was performing within spec on March 12", what is your evidence?

The answer is the eval log. For every production deployment of a model, you should have:

An eval suite that runs against the model version currently in production.
A schedule (typically nightly) that runs the suite and records results.
A persistent record of every run, including the eval suite version, the model version, the prompts, the expected outputs, and the actual outputs.
An alert that fires when scores drop below a threshold.

When the regulator asks about March 12, the answer is: "on March 12, the eval suite was at version 1.4.2, the model was claude-opus-4-7, the suite ran at 02:00 UTC and scored 94.7% against a 90% threshold, no alerts fired, and the previous seven days of results were between 93.1% and 95.4%". That is an audit-grade answer to a quality question.

The eval log lives in the same kind of append-only structure as the production audit log, with cross-references where useful (a sampled production query can be added to the eval set; the eval set can reference production failures).

What to redact, when, and where

A common worry: "if I store the literal user input and the full model output, am I now sitting on a pile of PII that is itself an audit liability?"

Yes. This is real and it has to be designed for.

The principle: redact at the view, not at the store. The audit log stores raw. Views over the audit log apply role-based redaction: the application UI shows the user a summarised version; the internal operations dashboard shows authorised staff a more complete version; the regulator-facing export, on request, shows the full record with appropriate access controls.

The redaction logic lives in the view layer and is itself auditable. "User X viewed audit record Y on date Z" is an audit event. The record of who has accessed sensitive parts of the audit log is itself audit-grade.

The reason this matters: if your application logs are themselves redacted at the point of storage, you cannot un-redact them later when, for example, a different regulator asks a different question with a wider remit. View-time redaction preserves the option; storage-time redaction destroys it.

A worked example

Imagine an agent that helps an internal user query the application portfolio. The user asks: "which apps in the Finance domain process European personal data and have a contract renewal due before year end?"

A complete audit record for this single interaction looks like this:

trace_id: 7f3a8c2e-...

[14:32:07.123] REQUEST
  actor_id: tarun-...
  actor_type: USER
  payload:
    request_text: "which apps in the Finance domain ..."
    permissions_snapshot: ["portfolio:read", "ai_assistant:use"]
    idp_claims: { tenant: "main", department: "Architecture" }

[14:32:07.456] CONTEXT
  payload:
    system_prompt: "You are an enterprise architecture assistant ..."
    model: "claude-opus-4-7"
    parameters: { effort: "high", thinking: { type: "adaptive" } }
    retrieved_chunks:
      - { id: "app-042", title: "...", content: "...", score: 0.92 }
      - { id: "app-119", title: "...", content: "...", score: 0.88 }
      - { id: "app-208", title: "...", content: "...", score: 0.84 }
    tools: ["search_portfolio", "filter_by_attributes"]

[14:32:11.892] GENERATION
  payload:
    raw_output: "Three apps match your criteria: ..."
    parsed_structure: { matches: ["app-042", "app-119", "app-208"], reasoning: "..." }
    latency_ms: 4436
    input_tokens: 1247
    output_tokens: 312
    tool_calls: []

[14:32:12.001] ACTION
  payload:
    action_type: "render_response"
    target: "chat_session_..."
    automated: true

This is the audit-grade view of one interaction. When the regulator asks about it three years later, every layer can be reconstructed. The system prompt is inlined. The retrieved chunks are inlined. The raw output is inlined. The action is recorded. The trace ID threads everything together.

The cost of this is storage and a small amount of write-time latency. For a typical enterprise agent, audit log volume is on the order of single-digit megabytes per day. Cheap relative to the value of being able to answer the regulator's question.

Failure modes I have seen in production

A short list of things that look like they work and don't:

Audit logs in the application's main database with the same privileges as the application user. A bug in the application layer that updates audit rows defeats the whole point. The audit log table needs INSERT-only grants. If you can do this with a separate database role on the same database, that's fine; better is a separate write-only log destination (a dedicated event store, an append-only message log, a write-once-read-many store) that the application cannot delete from.
A reasonable audit log for the LLM layer but no link to the database writes the agent caused. The model side is fine. The database side is fine. They are not linked. Always propagate the trace ID into the database writes.
Conversation history stored only on the client side. A web chat that retains conversation in the browser, sends the history to the model with each turn, but does not store the history server-side. When the regulator asks "what was the model told", the answer is "ask the user, they have it in their browser cache". This does not work. Server-side conversation storage is the audit trail.
Citations in the model's response that point to documents the model didn't actually see. This is a hallucination class. It happens. The defence is in the context layer: every citation in the output must be verifiable against the retrieved chunks captured at the context layer. If a citation references a document not present in the context, that is itself an auditable anomaly and should be flagged.
Tool calls treated as part of "the response" rather than as separate audit events. The model's tool calls are decisions. Each one has its own arguments, its own response, its own latency, its own success/failure status. Treating them as opaque steps inside the generation collapses the audit chain. Each tool call needs its own audit row.
The "automated vs human-approved" flag is missing. When the regulator asks "did a human approve this action", the answer needs to be recoverable from the audit log alone. Adding the automated: bool and approved_by_id fields to every action row is cheap and pays for itself the first time anyone asks.

What this gets you, in practice

When the framework above is implemented properly, three things become trivial that were previously hard.

Replayability. Any past decision can be reconstructed. You can re-show, exactly, what the model saw and what it produced. Useful for debugging, useful for retrospective evals, useful when defending the system to an auditor.

Anomaly detection. Anomalies in any of the four layers are detectable. A spike in retrieval-confidence variance. A latency outlier. A tool call with unusual arguments. A run of automated actions without human approval where there usually is one. These are all queryable on the audit log.

Regulatory defensibility. When the regulator arrives, the answer to "show me how this works" is not a slide deck. It is a query against the audit log that produces an exact, timestamped, sourced record. The regulator does not need to trust the policy document; they can read the data.

This last point is the actual goal. Most AI governance work is producing assurance through documentation. Audit instrumentation produces assurance through evidence. Evidence is what gets you through a real audit; documentation is what gets you through a desk-side review.

Where this fits

Two related pieces on this site:

Meridian: building the EA platform we couldn't buy — describes the broader context and the conversational assistant the audit framework was designed around.
CANVAS: building the approval workflow no commercial product covers — describes the workflow side of the same system, where the action layer of the audit framework is wired into the application workflow.

If you are starting an agent build today and any of the four layers above is missing from your design, stop and add it before you write more application code. Retrofitting audit is more expensive than designing it in. The instrumentation discipline is also the discipline that makes the system itself better — every layer of the audit story is also a layer of the system that can be tested, observed, and improved.