<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Tarun Bulchandani</title>
  <subtitle>Long-form writing on architecture, AI, and enterprise transformation.</subtitle>
  <link href="https://tarun.bulchandanis.com/blog/feed.xml" rel="self" type="application/atom+xml"/>
  <link href="https://tarun.bulchandanis.com/blog/"/>
  <updated>2026-11-05T00:00:00.000Z</updated>
  <id>https://tarun.bulchandanis.com/</id>
  <author>
    <name>Tarun Bulchandani</name>
    <uri>https://tarun.bulchandanis.com/</uri>
  </author>
  <entry>
    <title>Top trends in enterprise architecture 2026</title>
    <link href="https://tarun.bulchandanis.com/blog/top-trends-enterprise-architecture-2026/"/>
    <id>https://tarun.bulchandanis.com/blog/top-trends-enterprise-architecture-2026/</id>
    <updated>2026-11-05T00:00:00.000Z</updated>
    <published>2026-11-05T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Capgemini publishes top-trends pieces for banking, insurance and financial services. BCG runs its AI Radar. PwC publishes its UK economic predictions. Nobody publishes a working architect&#39;s top-trends for the enterprise architecture practice itself. This is mine.</summary>
    <content type="html">&lt;p&gt;Capgemini publishes top-trends pieces for banking,
insurance and financial services each year. BCG runs
the AI Radar. PwC publishes its UK economic
predictions. McKinsey runs its State of AI series.
Nobody publishes a working architect&#39;s top-trends piece
for the enterprise architecture practice itself.&lt;/p&gt;
&lt;p&gt;This is the first annual version of one. It is written
from the practitioner side and is calibrated against
what I see in my own work and in conversations with
peers in the function.&lt;/p&gt;
&lt;h2&gt;Trend 1: the EA function carries more direct&lt;/h2&gt;
&lt;p&gt;delivery weight&lt;/p&gt;
&lt;p&gt;In 2024, the EA function was largely a reviewing
function: standards, frameworks, governance reviews,
target-state architecture. By the end of 2026, the
firms taking AI seriously have moved the function into
direct delivery on platform components (agent
platform, identity platform, data platform).&lt;/p&gt;
&lt;p&gt;The shift is real and not yet reflected in most EA
function staffing. The function that delivered well in
2024 is under-resourced for the 2026 demand.&lt;/p&gt;
&lt;h2&gt;Trend 2: TOGAF and the equivalent frameworks need&lt;/h2&gt;
&lt;p&gt;adaptation, not replacement&lt;/p&gt;
&lt;p&gt;TOGAF and the other established EA frameworks were
designed for an environment with slower change, more
deterministic workloads and clearer ownership
boundaries. The agentic era stretches all three. The
frameworks still work but they need adaptation: faster
iteration of the architecture position, explicit
treatment of stochastic workloads, clearer rules for
agent-driven integration.&lt;/p&gt;
&lt;p&gt;The firms claiming TOGAF is dead are overstating the
case. The firms that ignore the adaptation question are
under-stating it.&lt;/p&gt;
&lt;h2&gt;Trend 3: the commercial EA tool market is&lt;/h2&gt;
&lt;p&gt;restructuring&lt;/p&gt;
&lt;p&gt;I wrote about this in &lt;a href=&quot;/blog/ea-tool-market-18-months/&quot;&gt;the EA tool market has 18
months&lt;/a&gt; and the
intervening twelve months have, if anything,
accelerated the shift. The major commercial EA tools
are absorbed into broader platforms or are losing
relevance to AI-augmented internal alternatives. The
EA function has to make a deliberate choice rather than
inherit one.&lt;/p&gt;
&lt;h2&gt;Trend 4: architecture decision records become a&lt;/h2&gt;
&lt;p&gt;living practice again&lt;/p&gt;
&lt;p&gt;ADRs were a 2010s discipline that decayed in many
firms. The agentic era has reinvigorated the
practice for a specific reason: AI agents can read ADRs
and use them to inform code generation, refactoring
recommendations, and integration design. ADRs that
were a documentation chore are becoming an operational
artefact.&lt;/p&gt;
&lt;p&gt;See &lt;a href=&quot;/blog/architecture-decision-records-generative-ai/&quot;&gt;The evolving role of architecture decision records
in the age of generative
AI&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Trend 5: fitness functions become measurable&lt;/h2&gt;
&lt;p&gt;Architectural fitness functions have been a concept
since the early 2010s but were rarely measured in
practice. The observability investment around agent
platforms has, as a side effect, made many of the
fitness function metrics genuinely measurable. The EA
function can now operate against measured fitness
functions rather than asserted ones. See &lt;a href=&quot;/blog/architectural-fitness-functions-framework/&quot;&gt;Architectural
fitness functions: a practical
framework&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Trend 6: enterprise architecture and the regulatory&lt;/h2&gt;
&lt;p&gt;function move closer together&lt;/p&gt;
&lt;p&gt;The convergence is structural. The regulatory function
has more direct dependency on architectural choices
(AI use cases, data residency, agent governance, model
inventory). The architecture function has more direct
exposure to regulatory enforcement (SS1/23, EU AI Act,
operational resilience). The two functions have to
work as one team, not as two adjacent functions.&lt;/p&gt;
&lt;p&gt;Firms that have not made this organisational shift will
do so in 2026 or 2027.&lt;/p&gt;
&lt;h2&gt;Trend 7: MCP and equivalent standards become the&lt;/h2&gt;
&lt;p&gt;operating norm&lt;/p&gt;
&lt;p&gt;Twelve months ago, MCP was a curiosity. Twelve months
from now, it will be assumed. The firms that have not
adopted it will be the exception, and the cost of being
the exception will be measurable. See &lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most
important enterprise standard nobody is
implementing&lt;/a&gt; for the
context.&lt;/p&gt;
&lt;h2&gt;Trend 8: the platform-team-vs-product-team boundary&lt;/h2&gt;
&lt;p&gt;gets redrawn&lt;/p&gt;
&lt;p&gt;The DevOps consolidation of the late 2010s blurred the
boundary between platform teams and product teams. The
agentic era has prompted a clearer redraw: platform
teams own the foundational components (model serving,
agent platform, observability, identity); product teams
own the use cases that consume them. The architecture
function has to be explicit about which is which.&lt;/p&gt;
&lt;h2&gt;Trend 9: the architecture function develops a&lt;/h2&gt;
&lt;p&gt;buy-side discipline&lt;/p&gt;
&lt;p&gt;The vendor selection decisions in 2026 are larger and
more consequential than in any previous EA cycle.
Foundation model vendors, agent platform vendors, SaaS
vendors with embedded AI. The architecture function has
to develop a buy-side discipline that operates at
the level the decisions require, with proper criteria,
proper diligence and proper negotiation support. Most
firms have not invested in this capability.&lt;/p&gt;
&lt;h2&gt;Trend 10: the architect-as-builder model gains&lt;/h2&gt;
&lt;p&gt;traction&lt;/p&gt;
&lt;p&gt;A small but growing number of architecture leaders are
shipping code, not just specifications. The Meridian
and CANVAS systems I built at Sonnedix sit in this
category. The pattern is not appropriate for every
firm or every architect, but where it works it
delivers materially faster than the
specification-driven model. See &lt;a href=&quot;/blog/case-study-meridian/&quot;&gt;the Meridian case
study&lt;/a&gt; and &lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;the CANVAS case
study&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Where this leaves the function&lt;/h2&gt;
&lt;p&gt;The EA function in 2026 is materially different from
the function in 2022. More delivery weight, more
regulatory exposure, more direct ownership of platform
components, more accountability for vendor decisions.&lt;/p&gt;
&lt;p&gt;The firms whose EA functions adapt to this carry the
agentic transition well. The firms whose EA functions
remain in the reviewing posture will struggle.&lt;/p&gt;
&lt;p&gt;This piece will be revisited annually. The 2027 version
will mark which of these trends accelerated, which
plateaued, and which were overstated.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/cio-ai-agenda-2026/&quot;&gt;The CIO&#39;s AI agenda for
2026&lt;/a&gt;, &lt;a href=&quot;/blog/banking-fs-architecture-top-trends-2026/&quot;&gt;Banking and
financial services architecture top trends
2026&lt;/a&gt;,
&lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference architecture for agentic AI in the
regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/ea-tool-market-18-months/&quot;&gt;The commercial EA tool market has 18
months&lt;/a&gt;, &lt;a href=&quot;/blog/acquisition-heavy-architecture/&quot;&gt;What an
acquisition-heavy company actually needs from its
architects&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The CIO&#39;s AI agenda for 2026: an architect&#39;s read</title>
    <link href="https://tarun.bulchandanis.com/blog/cio-ai-agenda-2026/"/>
    <id>https://tarun.bulchandanis.com/blog/cio-ai-agenda-2026/</id>
    <updated>2026-10-29T00:00:00.000Z</updated>
    <published>2026-10-29T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Capgemini&#39;s &#39;when IT meets AI: the CIO perspective&#39; and Bain&#39;s CIO conversation series both frame the CIO&#39;s AI agenda. The architecture function&#39;s read of the same agenda is more specific. Seven workstreams the CIO has to fund and the architecture function has to deliver.</summary>
    <content type="html">&lt;p&gt;Capgemini publishes &amp;quot;when IT meets AI: the CIO
perspective&amp;quot; pieces; Bain runs its CIO conversation
series; McKinsey publishes its CIO-track work. Each
piece is calibrated to the executive audience. The
architecture function&#39;s read of the same agenda is
specific: which workstreams the CIO has to fund, who has
to own them, and what the architecture function has to
deliver.&lt;/p&gt;
&lt;p&gt;This piece is that read, organised around the seven
workstreams I think the CIO actually has to land in
2026.&lt;/p&gt;
&lt;h2&gt;Workstream 1: the agentic AI platform&lt;/h2&gt;
&lt;p&gt;By far the largest single line in the IT budget for
2026 in firms taking AI seriously. The platform
components (registry, tool gateway, observability,
policy engine) covered in &lt;a href=&quot;/blog/platform-strategy-agentic-ai/&quot;&gt;the platform strategy
piece&lt;/a&gt; and the
&lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;reference architecture
piece&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Ownership: architecture function, with engineering
delivery.&lt;/p&gt;
&lt;p&gt;What goes wrong: the platform is under-funded relative
to the use cases it has to support. The use case teams
build their own and the firm ends up with multiple
incompatible stacks.&lt;/p&gt;
&lt;h2&gt;Workstream 2: the AI governance regime&lt;/h2&gt;
&lt;p&gt;Model risk management, agent risk management, AI
incident response, model and agent registries, audit
support, regulatory engagement.&lt;/p&gt;
&lt;p&gt;Ownership: shared between the architecture function,
the model risk function and the compliance function.
The CIO&#39;s job is to fund the shared infrastructure.&lt;/p&gt;
&lt;p&gt;What goes wrong: each function builds its own AI
governance. The firm has three governance regimes that
do not reconcile and three teams duplicating effort.&lt;/p&gt;
&lt;h2&gt;Workstream 3: the legacy modernisation portfolio&lt;/h2&gt;
&lt;p&gt;The pre-existing modernisation programme, which has not
gone away. ERP transformations, core banking
modernisations, legacy mainframe migration, end-of-life
software replacement.&lt;/p&gt;
&lt;p&gt;The agentic shift has changed the integration demands
and the data demands of these programmes. The
modernisation portfolio has to be re-baselined.&lt;/p&gt;
&lt;p&gt;Ownership: each programme&#39;s own leadership, with
architecture function oversight.&lt;/p&gt;
&lt;p&gt;What goes wrong: the modernisation programmes proceed
on their original assumptions and have to be re-cut
mid-flight. The cost of mid-flight re-cuts is
materially higher than the cost of upfront re-baselining.&lt;/p&gt;
&lt;h2&gt;Workstream 4: the cyber and identity uplift&lt;/h2&gt;
&lt;p&gt;The agent population is, structurally, a population of
non-human identities. The legacy identity treatment
does not work for them. The cyber control surface for
agent-driven workflows is different from the human-
driven equivalent.&lt;/p&gt;
&lt;p&gt;Ownership: CISO and architecture function jointly.&lt;/p&gt;
&lt;p&gt;What goes wrong: the cyber uplift is treated as
incremental to the existing cyber programme. The
specific agent-driven threats are not designed for.
See &lt;a href=&quot;/blog/non-human-identity-ai-agents-ea-pattern/&quot;&gt;Non-human identity in the age of AI
agents&lt;/a&gt;
and &lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber guardrails for AI agents in regulated
workflows&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Workstream 5: the data platform consolidation&lt;/h2&gt;
&lt;p&gt;Integrated reporting, ESG data, operational analytics,
the agentic AI workloads all require a coherent data
platform. Most firms are running multiple incompatible
data platforms accumulated over the last decade.&lt;/p&gt;
&lt;p&gt;Ownership: architecture function and the data
function, where the data function exists.&lt;/p&gt;
&lt;p&gt;What goes wrong: the consolidation is deferred because
the immediate cost is visible and the benefit accrues
gradually. The firm ends up paying both the legacy
cost and the new platform cost concurrently for years.
See &lt;a href=&quot;/blog/integrated-reporting-ea-function/&quot;&gt;Integrated reporting and the enterprise
architecture function&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Workstream 6: the vendor and outsource discipline&lt;/h2&gt;
&lt;p&gt;Foundation model vendors, AI tool vendors, SaaS
vendors with embedded AI features. Each is, in
regulated firms, an outsource arrangement. The vendor
selection has to apply the outsource discipline rather
than the technology procurement discipline.&lt;/p&gt;
&lt;p&gt;Ownership: architecture function, supplier management
function, regulatory function.&lt;/p&gt;
&lt;p&gt;What goes wrong: the vendor selection happens in
procurement on standard procurement terms. The
outsource discipline is applied retroactively, often
after the contract is signed. The renegotiation is
expensive and sometimes infeasible.&lt;/p&gt;
&lt;h2&gt;Workstream 7: the cost discipline&lt;/h2&gt;
&lt;p&gt;Agent workloads can become expensive quickly. The
foundation model costs, the inference compute costs,
the observability storage costs, the audit retention
costs all compound. Without cost attribution and cost
control, the firm finds out about the cost surprise
after the fact.&lt;/p&gt;
&lt;p&gt;Ownership: CIO, FinOps function, architecture function.&lt;/p&gt;
&lt;p&gt;What goes wrong: cost attribution is built after the
cost surprise rather than before it. The use case
teams have no incentive to manage their consumption.&lt;/p&gt;
&lt;h2&gt;The funding shape&lt;/h2&gt;
&lt;p&gt;A 2026 IT budget that does these seven workstreams well
looks materially different from a 2024 budget. Three
shifts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;More platform investment, less use-case investment.&lt;/strong&gt;
The platform components above pay back across many use
cases. Funding them properly is more efficient than
funding each use case to build its own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;More governance investment.&lt;/strong&gt; The model risk function,
the regulatory function and the architecture function
all need more capacity than the 2024 baseline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;More observability and cost discipline.&lt;/strong&gt; The
operational characteristics of the agent estate
require investment in the run-time discipline, not
just the build-time delivery.&lt;/p&gt;
&lt;h2&gt;Where this leaves the CIO&lt;/h2&gt;
&lt;p&gt;The CIO that funds these seven workstreams well in
2026 puts the firm in a position to capture the
agentic AI value the consultancy commentary points at.
The CIO that funds the use cases without funding the
underlying workstreams will spend 2027 and 2028
rebuilding.&lt;/p&gt;
&lt;p&gt;The architecture function&#39;s job is to make the case
for the underlying workstreams clearly and
defensibly. The CIO&#39;s job is to back the case.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/banking-fs-architecture-top-trends-2026/&quot;&gt;Banking and financial services
architecture top trends
2026&lt;/a&gt;,
&lt;a href=&quot;/blog/platform-strategy-agentic-ai/&quot;&gt;Platform strategy for agentic
AI&lt;/a&gt;, &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference
architecture for agentic AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/top-trends-enterprise-architecture-2026/&quot;&gt;Top trends in enterprise architecture
2026&lt;/a&gt;,
&lt;a href=&quot;/blog/ea-tool-market-18-months/&quot;&gt;The commercial EA tool market has 18
months&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Integrated reporting and the enterprise architecture function</title>
    <link href="https://tarun.bulchandanis.com/blog/integrated-reporting-ea-function/"/>
    <id>https://tarun.bulchandanis.com/blog/integrated-reporting-ea-function/</id>
    <updated>2026-10-22T00:00:00.000Z</updated>
    <published>2026-10-22T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>EY&#39;s &#39;how integrated reporting can give you the whole story&#39; piece sets the case from the CFO&#39;s vantage. The architecture function carries more of the delivery than that framing implies, particularly as ESG and operational data become integrated reporting inputs.</summary>
    <content type="html">&lt;p&gt;EY published &amp;quot;how integrated reporting can give you the
whole story&amp;quot; earlier this year. The piece argues that
the conventional separation between financial reporting,
ESG reporting, operational reporting and strategic
narrative no longer serves the audiences that consume
them. The integrated reporting concept (financial,
non-financial, narrative woven together) is the
proposed response.&lt;/p&gt;
&lt;p&gt;The piece is written for the CFO and the audit
committee. The architecture function carries more of
the delivery than that framing implies, particularly
as ESG and operational data move from manual
spreadsheet collection to machine-readable, auditable,
real-time feeds.&lt;/p&gt;
&lt;p&gt;This piece sets out what the architecture function has
to deliver to make integrated reporting genuinely
useful.&lt;/p&gt;
&lt;h2&gt;What integrated reporting actually requires&lt;/h2&gt;
&lt;p&gt;Three categories of data have to flow into the same
canonical reporting layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Financial data.&lt;/strong&gt; The general ledger, the
sub-ledgers, the consolidation engine. This is well-
established in the existing finance estate.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. ESG and sustainability data.&lt;/strong&gt; Emissions data
(Scope 1, 2 and 3), energy consumption, water,
diversity metrics, governance metrics, supply chain
visibility. In most firms, this data is collected
through annual surveys, manual spreadsheets and
opportunistic systems. The CSRD, ISSB and SEC climate
disclosure frameworks have made the requirement firmer;
the underlying data discipline is often still weak.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Operational and strategic data.&lt;/strong&gt; Customer
metrics, employee metrics, operational KPIs, strategic
programme status. Scattered across CRM, HR systems,
operational dashboards and the strategy team&#39;s
spreadsheets.&lt;/p&gt;
&lt;p&gt;The integrated reporting frame requires these to
reconcile against each other, to be retrievable on the
same cadence, and to support the same audit standards.&lt;/p&gt;
&lt;h2&gt;The architecture problems&lt;/h2&gt;
&lt;p&gt;Four problems show up in every implementation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The data dictionary is inconsistent.&lt;/strong&gt; The same
concept is named differently in different systems.
&amp;quot;Active customer&amp;quot; in CRM does not match &amp;quot;billed
customer&amp;quot; in the billing engine which does not match
&amp;quot;recognised revenue customer&amp;quot; in the general ledger.
Without a defined data dictionary, the reports do not
reconcile.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The cadence does not align.&lt;/strong&gt; Financial close is
monthly; ESG data collection is annual; operational
data is daily. Integrated reporting requires a common
cadence at least for the periods being reported, which
forces investment in the slower data streams.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The audit standards differ.&lt;/strong&gt; Financial data has
been audited to a clear standard for decades. ESG data
is moving toward audit, but the standards are still
evolving. Operational data is rarely audited. The
integrated report has to handle the variation
explicitly rather than gloss it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The system boundaries do not match the reporting
boundaries.&lt;/strong&gt; The financial entity structure, the
operational footprint, the legal entity structure and
the ESG reporting boundary are all different. The
data layer has to support translation between them.&lt;/p&gt;
&lt;h2&gt;What the architecture function has to deliver&lt;/h2&gt;
&lt;p&gt;Five components.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A common data dictionary.&lt;/strong&gt; The architecture function
defines, and the data governance function maintains, a
common dictionary of concepts used in reporting.
Includes financial concepts (revenue, EBITDA),
operational concepts (active customers, churn), and
ESG concepts (emissions intensity, water withdrawal).
The dictionary is authoritative; the source systems
reconcile to it, not the other way around.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An integrated data fabric.&lt;/strong&gt; A data layer that
exposes the dictionary concepts across the source
systems. The implementation varies (warehouse,
lakehouse, mesh) but the requirement is the same:
auditable, queryable, versioned, governed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A reporting cadence calendar.&lt;/strong&gt; The defined cadence
for each concept (daily, monthly, quarterly, annually)
and the dependencies between them (operational data
feeds quarterly reporting, financial close feeds
annual reporting).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An audit trail.&lt;/strong&gt; Every figure in the integrated
report has to be traceable to its underlying data with
the relevant audit evidence. The architecture function
has to build this traceability into the data fabric;
retrofitting it during the audit cycle is materially
more expensive.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A change governance layer.&lt;/strong&gt; Reporting concepts will
change (new ESG standards, new operational metrics,
new strategic programmes). The architecture function
has to manage these changes without breaking
year-over-year comparability or audit trail
continuity.&lt;/p&gt;
&lt;h2&gt;Where firms underspend&lt;/h2&gt;
&lt;p&gt;Two areas.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The ESG data fabric.&lt;/strong&gt; Most firms are still treating
ESG data as a separate workstream with its own systems
and its own annual cadence. The integrated reporting
frame requires it to land in the same data fabric as
the financial data, with the same governance. Few
firms have made this investment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The traceability layer.&lt;/strong&gt; Most firms can produce the
integrated report; few can defend the specific numbers
in it to the granularity an integrated audit will
require. The architecture function should be
investing in traceability before the audit pressure
arrives.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;Integrated reporting is, at the executive layer, a
narrative project. At the architecture layer, it is a
data platform project with specific governance
requirements. The firms that get it right invest in
the data platform first and the narrative second.&lt;/p&gt;
&lt;p&gt;For firms doing this work in 2026, my recommendation is
to start with the common data dictionary, then invest
in the ESG data fabric, then build the traceability
layer. The integrated report itself follows from those
three.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/architectural-fitness-functions-framework/&quot;&gt;Architectural fitness functions: a
practical
framework&lt;/a&gt;,
&lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference architecture for agentic AI in the
regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/top-trends-enterprise-architecture-2026/&quot;&gt;Top trends in enterprise architecture
2026&lt;/a&gt;,
&lt;a href=&quot;/blog/large-scale-erp-transformation-lessons/&quot;&gt;Lessons from large-scale ERP
transformation&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The intelligent superhighway, translated: what AI-ready cloud foundations actually mean</title>
    <link href="https://tarun.bulchandanis.com/blog/intelligent-superhighway-translated/"/>
    <id>https://tarun.bulchandanis.com/blog/intelligent-superhighway-translated/</id>
    <updated>2026-10-15T00:00:00.000Z</updated>
    <published>2026-10-15T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Accenture&#39;s &#39;intelligent superhighway&#39; and &#39;AI innovation is nonstop, your cloud foundation should be too&#39; pieces are the cleanest articulation of the AI-ready cloud foundation narrative. The architecture function still has to translate the narrative into specific design decisions.</summary>
    <content type="html">&lt;p&gt;Accenture&#39;s &amp;quot;intelligent superhighway&amp;quot; framing and the
companion &amp;quot;AI innovation is nonstop, your cloud
foundation should be too&amp;quot; piece are the cleanest
articulation of the AI-ready cloud foundation narrative
in the public commentary. The pieces make the
strategic case well: the enterprise cloud strategies of
the last decade were written for a workload mix that no
longer reflects what the firm actually runs.&lt;/p&gt;
&lt;p&gt;The strategic case is correct. The architecture function
still has to translate &amp;quot;intelligent superhighway&amp;quot; into
specific design decisions. This piece does the
translation.&lt;/p&gt;
&lt;h2&gt;What the marketing language actually means&lt;/h2&gt;
&lt;p&gt;Five elements turn up in every working AI-ready cloud
foundation. The marketing language sometimes obscures
what each one is for.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. A unified data layer.&lt;/strong&gt; The classical enterprise
estate has data scattered across operational systems,
analytical warehouses, data lakes and the various
SaaS systems the firm has accumulated. The &amp;quot;unified
data layer&amp;quot; is the architecture function&#39;s commitment
to a single canonical view of the firm&#39;s data,
accessible by agents at low latency and respecting the
firm&#39;s data governance.&lt;/p&gt;
&lt;p&gt;In practice: a properly designed data mesh or data fabric
with explicit ownership, explicit quality contracts and
explicit access controls. The technology choice (data
mesh vs lakehouse vs warehouse) matters less than the
governance discipline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. A low-latency model serving layer.&lt;/strong&gt; Foundation
models and bespoke models served close to the
operational data, with predictable latency and
predictable cost. For most regulated firms, this is the
vendor&#39;s enterprise tenant in the right jurisdiction
rather than the vendor&#39;s public API.&lt;/p&gt;
&lt;p&gt;In practice: model deployment in the firm&#39;s cloud
tenancy, with the same operational discipline as any
other production workload (capacity planning, SLO
monitoring, incident response).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. A scalable tool gateway.&lt;/strong&gt; Agents calling tools at
scale require the tool gateway to handle the volume.
Most existing enterprise integration platforms are not
designed for this; they were built for transactional
volumes, not for agent-driven volumes that can spike
non-linearly.&lt;/p&gt;
&lt;p&gt;In practice: a purpose-built tool gateway with strong
rate limiting, queueing, observability and circuit
breaking. The architecture function should expect to
build this rather than buy it; the commercial options
are still maturing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. A robust observability layer.&lt;/strong&gt; The agent
reasoning traces, tool calls, output samples and
override events have to be captured, indexed and
queryable. The volume is materially higher than
classical application observability.&lt;/p&gt;
&lt;p&gt;In practice: an observability stack tuned for the agent
workload. Most firms underestimate the storage and
indexing cost of this by a factor of three to five.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. A governance fabric.&lt;/strong&gt; Identity, authorisation,
audit, change control, model and agent registries.
Wired through the rest of the platform so the agent
operating envelope is enforced consistently.&lt;/p&gt;
&lt;p&gt;In practice: the agent platform components I have
covered in &lt;a href=&quot;/blog/platform-strategy-agentic-ai/&quot;&gt;the platform strategy
piece&lt;/a&gt; and the
&lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;reference architecture
piece&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;What &amp;quot;always-on innovation&amp;quot; actually requires&lt;/h2&gt;
&lt;p&gt;The &amp;quot;your cloud foundation should be too&amp;quot; framing
points at something specific: the cloud foundation has
to support continuous deployment of new models, new
prompts, new agents, new tool integrations without
destabilising production.&lt;/p&gt;
&lt;p&gt;Three operational characteristics turn out to be
necessary.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Continuous deployment with rollback.&lt;/strong&gt; New agent
versions deploy through a defined pipeline. Rollback is
single-command. The audit trail of which version was
running when is preserved.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shadow deployment.&lt;/strong&gt; New models, new prompts and new
agent versions run alongside the production version on
sampled traffic. Performance is compared before
promotion. Most firms have not built this for the agent
workload.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Canary and blast-radius control.&lt;/strong&gt; New agent
versions reach a small fraction of the production
volume first. The blast radius of a regression is
bounded. Promotion to full volume is a deliberate
decision.&lt;/p&gt;
&lt;p&gt;These are not new ideas; they are well-established
deployment patterns from the conventional application
estate. They have to be specifically applied to the
agent estate and have to be funded.&lt;/p&gt;
&lt;h2&gt;Where firms underspend&lt;/h2&gt;
&lt;p&gt;Three areas turn up reliably as underspent.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observability.&lt;/strong&gt; As above. The volume is a surprise
the first time it lands.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cost attribution.&lt;/strong&gt; Agent workloads can become
expensive quickly. Without per-agent, per-use-case cost
attribution, the firm cannot make informed decisions
about where to invest and where to retire. Most firms
build cost attribution after the cost surprise rather
than before it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pipeline tooling.&lt;/strong&gt; The continuous deployment, shadow
deployment and canary patterns above require pipeline
tooling that most existing enterprise CI/CD systems do
not provide out of the box. The investment is real and
is rarely budgeted in the original AI-platform business
case.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;The AI-ready cloud foundation is, on the whole, a
recognisable architecture pattern with new specifics.
The architecture function&#39;s job is to be clear about the
specifics, to fund the components that matter, and to
resist the marketing-language abstraction when
delivering the work.&lt;/p&gt;
&lt;p&gt;For firms doing this work in 2026, my recommendation is
to invest in observability and cost attribution before
scaling the agent footprint, to build the deployment
pipeline tooling as a platform investment rather than a
per-use-case investment, and to keep the model serving
layer as substitutable as the architecture can support.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/platform-strategy-agentic-ai/&quot;&gt;Platform strategy for agentic
AI&lt;/a&gt;, &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference
architecture for agentic AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/data-residency-ai-workloads-uk-eu/&quot;&gt;Data residency for AI workloads: a working pattern for
UK and EU
enterprises&lt;/a&gt;,
&lt;a href=&quot;/blog/cio-ai-agenda-2026/&quot;&gt;The CIO&#39;s AI agenda for 2026: an architect&#39;s
read&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Platform strategy for agentic AI: a working reference architecture</title>
    <link href="https://tarun.bulchandanis.com/blog/platform-strategy-agentic-ai/"/>
    <id>https://tarun.bulchandanis.com/blog/platform-strategy-agentic-ai/</id>
    <updated>2026-10-08T00:00:00.000Z</updated>
    <published>2026-10-08T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Accenture&#39;s &#39;rewriting platform strategy for agentic AI&#39; piece argues the right strategic case. The piece does not, and could not, deliver the architectural specificity a practising architect needs. This is the working platform strategy I use when actually building.</summary>
    <content type="html">&lt;p&gt;Accenture published &amp;quot;rewriting platform strategy for
agentic AI&amp;quot; earlier this year. The article makes the
right strategic case: the existing enterprise platform
strategy was written for a different kind of workload,
and the agentic shift requires a substantive rewrite,
not a marginal update.&lt;/p&gt;
&lt;p&gt;The article is, necessarily, written at the strategic
narrative layer. The architecture function that has to
deliver the rewrite needs a different kind of document.
This piece is that document.&lt;/p&gt;
&lt;h2&gt;What the existing platform strategy assumed&lt;/h2&gt;
&lt;p&gt;Five assumptions ran through most enterprise platform
strategies of the last decade.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Workloads are deterministic.&lt;/strong&gt; The same input
produces the same output. Where it does not, the
variance is a bug to be fixed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Authentication is for users.&lt;/strong&gt; The principal in
the authorisation flow is a human. Service-to-service
authentication is a special case.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Tools are called by code, in fixed sequences.&lt;/strong&gt;
The integration patterns are choreographed at design
time. Runtime composition is rare.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Audit trails are about who saw what.&lt;/strong&gt; Read
access, write access, configuration change. The
trail captures human-readable causation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Capacity planning is about peak load.&lt;/strong&gt; The peak
is forecastable from historical patterns and grown
linearly.&lt;/p&gt;
&lt;p&gt;The agentic shift breaks all five.&lt;/p&gt;
&lt;h2&gt;What the new platform strategy has to assume&lt;/h2&gt;
&lt;p&gt;The replacement assumptions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Workloads are stochastic.&lt;/strong&gt; The same input does
not produce the same output. The variance is a feature.
The architecture has to support reasoning about
behaviour across the distribution, not just at the
modal output.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Authentication is for principals of multiple
kinds.&lt;/strong&gt; Humans, services, agents, customer agents,
delegated principals. The authorisation flow has to
support all of them, with different control surfaces
for each.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Tools are called by agents, in sequences
determined at runtime.&lt;/strong&gt; Integration patterns are
composed dynamically. The architecture has to support
this without losing the safety properties.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Audit trails are about what the agent saw, what
it considered, what it decided and why.&lt;/strong&gt; Reasoning
traces become first-class data. Storage and
retrievability change.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Capacity planning is about reasoning depth and
breadth.&lt;/strong&gt; A single agent invocation can multiply into
many tool calls and many sub-agent invocations. The
capacity model has to account for this non-linearly.&lt;/p&gt;
&lt;h2&gt;The five-component platform&lt;/h2&gt;
&lt;p&gt;The platform that supports these new assumptions has
five components, each with a clear role and a clear
ownership boundary.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component 1: The agent runtime.&lt;/strong&gt; Where agents
execute. Loads the model, manages the reasoning loop,
invokes tools, returns outputs. This is the layer with
the most third-party options; the architecture function
should treat it as substitutable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component 2: The tool gateway.&lt;/strong&gt; Mediates every tool
call from every agent. Enforces authorisation, logs the
call, applies rate limits, handles failures. This is
the layer with the most leverage; the architecture
function should own it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component 3: The agent registry.&lt;/strong&gt; Source of truth
for every agent in the firm. Configuration, purpose,
authorisation policy, ownership, lifecycle status. See
&lt;a href=&quot;/blog/model-agent-registries-governance-artefact/&quot;&gt;Model and agent
registries&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component 4: The observability layer.&lt;/strong&gt; Captures
every reasoning trace, every tool call, every output.
Indexed for query, retained for the regulatory window.
This is the largest data layer the platform produces;
the architecture function should design for the volume
explicitly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component 5: The policy and override surface.&lt;/strong&gt;
Where humans intervene. Policy authors define what
agents can do; operators override what specific
agents do at runtime; auditors review what happened
after the fact. Three distinct user roles, one
underlying control plane.&lt;/p&gt;
&lt;h2&gt;The ownership boundaries&lt;/h2&gt;
&lt;p&gt;The platform strategy has to be explicit about who owns
which component.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture function:&lt;/strong&gt; Tool gateway, agent registry,
observability layer, policy and override surface. These
are the platform components where firm-wide consistency
matters and where the architecture function carries the
authority.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI function or use case teams:&lt;/strong&gt; Agent runtime and the
specific agents on top. These are where the use case
diversity lives and where the architecture function
should set standards rather than centralise delivery.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operations function:&lt;/strong&gt; Day-to-day operation of the
platform. SLA management, incident response, capacity
planning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compliance and risk:&lt;/strong&gt; Policy authorship, audit
review, model and agent inventory governance.&lt;/p&gt;
&lt;p&gt;The boundaries are not always cleanly observed in
practice. The architecture function&#39;s job is to clarify
them and to defend them.&lt;/p&gt;
&lt;h2&gt;What goes wrong&lt;/h2&gt;
&lt;p&gt;Three failure patterns recur in firms attempting this
rewrite.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The strategy is written but not staffed.&lt;/strong&gt; The
architecture function publishes the platform strategy,
the leadership team endorses it, and then no team is
funded to build the platform components. The use case
teams build bespoke equivalents inside their own
deployments. Within a year, the firm has multiple
incompatible agent stacks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The platform is built but the use cases are not
governed onto it.&lt;/strong&gt; The platform exists; the use case
teams deploy outside it because the governance does not
force them onto it. The platform becomes a
sub-scale exhibit rather than the operating backbone.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The platform is built but the operating model is
not.&lt;/strong&gt; The platform runs; the day-to-day operations are
under-resourced; incidents accumulate; trust in the
platform erodes. The use case teams start building
their own again.&lt;/p&gt;
&lt;h2&gt;How to deliver the rewrite&lt;/h2&gt;
&lt;p&gt;Three sequencing decisions matter.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Build the tool gateway and observability layer
first.&lt;/strong&gt; These are the highest-leverage components and
the ones most expensive to retrofit. A single use case
team can be the first internal customer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Make the agent registry the source of truth before
scaling.&lt;/strong&gt; A platform that scales without the registry
becomes unmanageable inside twelve months. Build the
registry alongside the first production deployment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Govern the use cases onto the platform from day one.&lt;/strong&gt;
The first three or four agent deployments establish the
operating norm. If they happen outside the platform,
the platform is a long-term fiction.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;Platform strategy for agentic AI is the highest-leverage
architectural decision the firm makes in 2026. The
firms that get this right will deliver multiple agent
capabilities cheaply and safely over the next three
years. The firms that get this wrong will spend the
same period rebuilding.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference architecture for agentic
AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most important enterprise standard nobody is
implementing&lt;/a&gt;,
&lt;a href=&quot;/blog/model-agent-registries-governance-artefact/&quot;&gt;Model and agent
registries&lt;/a&gt;,
&lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber guardrails for AI agents in regulated
workflows&lt;/a&gt;,
&lt;a href=&quot;/blog/cio-ai-agenda-2026/&quot;&gt;The CIO&#39;s AI agenda for 2026: an architect&#39;s
read&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Data residency for AI workloads: a working pattern for UK and EU enterprises</title>
    <link href="https://tarun.bulchandanis.com/blog/data-residency-ai-workloads-uk-eu/"/>
    <id>https://tarun.bulchandanis.com/blog/data-residency-ai-workloads-uk-eu/</id>
    <updated>2026-10-01T00:00:00.000Z</updated>
    <published>2026-10-01T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>BCG&#39;s &#39;AI sovereignty is an illusion, resilience is real&#39; piece reframes the debate well. The architecture function still has to translate the reframing into specific design choices. A working pattern for data residency in AI workloads for UK and EU enterprises.</summary>
    <content type="html">&lt;p&gt;BCG published &amp;quot;for most countries, AI sovereignty is an
illusion. Resilience is real&amp;quot; earlier this year. The
piece is a useful reframing of the public debate: pure
sovereignty over AI infrastructure is, for most
countries, not achievable at any reasonable cost, but
operational resilience under foreign-vendor dependency
is achievable and is the real engineering question.&lt;/p&gt;
&lt;p&gt;The reframing is correct at the level it operates. The
architecture function still has to translate it into
specific design choices for specific workloads. This
piece sets out the working pattern I use for UK and EU
enterprise AI workloads.&lt;/p&gt;
&lt;p&gt;(For the political framing of the broader debate, see
my earlier piece on &lt;a href=&quot;/blog/sovereign-ai-theatre/&quot;&gt;why sovereign AI is mostly
theatre&lt;/a&gt;.)&lt;/p&gt;
&lt;h2&gt;The data flows that matter&lt;/h2&gt;
&lt;p&gt;Five data flows have to be modelled for any AI workload
in a UK or EU enterprise.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Training data.&lt;/strong&gt; Data used to train or fine-tune
the model. In most regulated enterprises, this is the
firm&#39;s own customer or operational data, and the
residency requirements are explicit (GDPR, the FCA&#39;s
operational resilience rules, the EBA&#39;s outsourcing
guidance).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Inference input.&lt;/strong&gt; Data sent to the model at
inference time. Customer queries, transaction details,
document content. The residency requirements that
apply to the training data typically apply here too;
the architecture sometimes forgets this.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Inference output.&lt;/strong&gt; Data returned from the model.
For most use cases this is derivative of the input and
inherits the residency requirements, but for some use
cases (synthesised content, summarisation that
includes new material) the output deserves its own
residency analysis.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Prompt and instruction data.&lt;/strong&gt; The system
prompts, the guardrail prompts, the example libraries.
These are the firm&#39;s intellectual property and the
firm&#39;s risk surface. They deserve their own residency
treatment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Audit and observability data.&lt;/strong&gt; Logs of model
calls, agent decisions, tool invocations, override
events. The residency for these is sometimes treated
as a technical concern; in regulated firms it is a
compliance concern.&lt;/p&gt;
&lt;h2&gt;The residency decision matrix&lt;/h2&gt;
&lt;p&gt;For each of the five data flows, the firm has four
deployment options.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Option A: Public API of the vendor&#39;s home region.&lt;/strong&gt;
The model runs on the vendor&#39;s infrastructure, in the
vendor&#39;s home country. Lowest cost, lowest control,
most permissive vendor terms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Option B: Public API in a vendor-managed regional
deployment.&lt;/strong&gt; The vendor runs the model in an EU or UK
region. Costs more, gives some residency control,
typically reasonable vendor terms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Option C: Private deployment in vendor-managed
infrastructure within the firm&#39;s residency
requirement.&lt;/strong&gt; The vendor provisions dedicated capacity
in the right jurisdiction. Higher cost, materially
better control, custom contractual terms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Option D: Firm-managed deployment.&lt;/strong&gt; The firm runs
the model itself, on its own infrastructure or on a
hyperscaler tenancy under its control. Highest cost,
highest control, full responsibility for the
operational characteristics.&lt;/p&gt;
&lt;p&gt;The matrix is not &amp;quot;pick one&amp;quot;; it is &amp;quot;pick one per data
flow per workload&amp;quot;. A typical regulated workload might
land at Option B for inference input, Option C for
training data and prompt configuration, and Option D
for audit data.&lt;/p&gt;
&lt;h2&gt;The five design rules&lt;/h2&gt;
&lt;p&gt;Five rules I apply to every data residency design.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Inference data follows the regulatory framework of
the customer, not the firm.&lt;/strong&gt; A UK firm serving an
Italian customer has to apply EU residency rules to
that customer&#39;s data flow, regardless of where the
firm is headquartered. The architecture has to be able
to discriminate.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Audit data residency matches the regulatory
retention.&lt;/strong&gt; If the firm has to retain audit data for
seven years in a defined jurisdiction, the audit data
flow has to land in that jurisdiction. Vendor SLAs that
move audit data to other regions for &amp;quot;operational
purposes&amp;quot; are not acceptable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. The exit path has to be tested.&lt;/strong&gt; Operational
resilience is not theoretical. The firm has to be able
to fail over from Option A to Option B, or from Option
B to Option C, within a defined RTO. This needs to be
exercised in production-like conditions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. The prompt configuration is treated as
intellectual property.&lt;/strong&gt; Where the firm has invested in
prompt engineering, the prompts have to be protected as
firm IP. Vendor terms that grant the vendor rights to
use the prompts have to be negotiated out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. The model substitution path is explicit.&lt;/strong&gt; If
the chosen model is no longer available (vendor
withdrawal, regulatory action, price change beyond
acceptable thresholds), the workload has to be
substitutable to an alternative. This is a design
constraint, not an afterthought.&lt;/p&gt;
&lt;h2&gt;What the working pattern looks like&lt;/h2&gt;
&lt;p&gt;For a typical regulated UK or EU enterprise in 2026, the
working pattern I see deliver is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Inference input and output:&lt;/strong&gt; Vendor-managed
regional deployment (Option B) for the bulk of
workloads, with private deployment (Option C) for the
most sensitive use cases.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training and fine-tuning data:&lt;/strong&gt; Private deployment
(Option C) or firm-managed (Option D), depending on
the sensitivity and the volume.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prompt configuration:&lt;/strong&gt; Private deployment or
firm-managed, with explicit IP protections.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Audit and observability data:&lt;/strong&gt; Firm-managed
(Option D), in the jurisdiction of the regulatory
retention requirement.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is more expensive than the path-of-least-resistance
configuration (everything on the public API), and
materially less expensive than the maximalist
configuration (everything firm-managed). The trade-off
has to be deliberate.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;Data residency for AI workloads is an architectural
decision, not a policy decision. The architecture
function has to model the data flows, choose the
deployment options per flow, and design the operational
resilience.&lt;/p&gt;
&lt;p&gt;For firms doing this work in 2026, my recommendation is
to model the data flows explicitly before any vendor
selection, to negotiate the vendor terms against the
flow analysis, and to budget for the higher-control
options for the data flows that actually need them.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/sovereign-ai-theatre/&quot;&gt;Sovereign AI is mostly
theatre&lt;/a&gt;, &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference
architecture for agentic AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/uk-fs-regulation-ai-architecture-2026/&quot;&gt;What UK financial services regulation means for AI
architecture in
2026&lt;/a&gt;,
&lt;a href=&quot;/blog/identity-first-security-perimeter/&quot;&gt;Identity-first security: rethinking the enterprise
perimeter&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Model and agent registries: the missing governance artefact</title>
    <link href="https://tarun.bulchandanis.com/blog/model-agent-registries-governance-artefact/"/>
    <id>https://tarun.bulchandanis.com/blog/model-agent-registries-governance-artefact/</id>
    <updated>2026-09-24T00:00:00.000Z</updated>
    <published>2026-09-24T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Every regulated firm needs a model inventory under SS1/23. Most build one. Few build it well. The next layer up, the agent registry, is barely discussed in the public commentary. A practitioner&#39;s view on what both should contain and why neither is optional.</summary>
    <content type="html">&lt;p&gt;SS1/23 is explicit: a regulated UK financial services
firm must maintain a model inventory covering its
material model use. The EBA guidance and the Federal
Reserve&#39;s SR 11-7 say substantively the same thing in
the EU and US frameworks. The model risk management
function in every regulated firm now has a regulatory
obligation to keep this artefact current.&lt;/p&gt;
&lt;p&gt;In most firms, the artefact exists in some form. In few
firms is it built to a standard that genuinely supports
the governance the regulators expect. And in almost no
firms is there an equivalent registry for agents.&lt;/p&gt;
&lt;p&gt;This piece sets out what both registers should contain
and why neither is optional in 2026.&lt;/p&gt;
&lt;h2&gt;The model registry&lt;/h2&gt;
&lt;p&gt;The model registry is the catalogue of every material
model in production use. The contents the regulator
expects, in my reading of the supervisory statements
and the equivalent guidance:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Identification.&lt;/strong&gt; Each model has a unique identifier,
a version, and a lineage trail showing where it came
from. Foundation models are identified by vendor and
version; bespoke models are identified by the firm&#39;s
internal versioning scheme.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Purpose and use case.&lt;/strong&gt; What is the model used for?
What decision does it support? Where in the firm&#39;s
operating model does it sit?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Risk classification.&lt;/strong&gt; What is the materiality of the
model&#39;s use? What are the downside scenarios if it
fails? How is the failure caught?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Validation status.&lt;/strong&gt; Has the model been validated?
When? Against what? Who signed off?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Performance monitoring.&lt;/strong&gt; What metrics are tracked?
What thresholds trigger review? What is the cadence?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Accountable senior manager.&lt;/strong&gt; Named, in the SMCR sense
or equivalent. The person who carries the regulatory
accountability for this model.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lifecycle status.&lt;/strong&gt; In development, in pilot, in
production, in deprecation, retired. Each transition
has a defined approval workflow.&lt;/p&gt;
&lt;p&gt;This list is not exhaustive. The point is that the
registry is operational, not documentary. It supports
ongoing governance, not just an annual exercise.&lt;/p&gt;
&lt;h2&gt;What goes wrong&lt;/h2&gt;
&lt;p&gt;Three patterns recur in firms that have built a model
registry but built it poorly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The registry is a snapshot.&lt;/strong&gt; It is updated annually
as part of the audit cycle. By the time the audit team
reads it, the firm has changed model versions, deployed
new use cases and retired old ones. The snapshot does
not match the reality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The registry has no policy enforcement.&lt;/strong&gt; Adding a
new model to production does not require updating the
registry. The discipline relies on people remembering;
in busy delivery cycles, people forget.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The registry has no observability.&lt;/strong&gt; The registry
records the metadata; the live model performance lives
in operational systems and is not connected back to the
registry. The accountable senior manager has no
real-time view of what the registry says they own.&lt;/p&gt;
&lt;p&gt;A registry with these characteristics is a compliance
artefact, not a governance artefact. The regulators are
increasingly attentive to the difference.&lt;/p&gt;
&lt;h2&gt;The agent registry&lt;/h2&gt;
&lt;p&gt;An agent registry extends the model registry pattern to
cover AI agents. An agent is more than a model: it has
a prompt configuration, an authorisation policy, a tool
inventory and an operating envelope. Each of these is
material to the agent&#39;s behaviour and material to the
governance.&lt;/p&gt;
&lt;p&gt;The agent registry contents I have settled on, beyond
the model registry fields:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Authorisation policy.&lt;/strong&gt; What is the agent allowed to
read, call and write? What data sources, what tools,
what systems?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prompt configuration.&lt;/strong&gt; The system prompt, the
guardrail prompts, the example library. Versioned;
changes flow through change control.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool inventory.&lt;/strong&gt; The specific tools the agent has
access to. Mapped to the underlying systems and the
policy engine entries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operating envelope.&lt;/strong&gt; The volume the agent is
authorised to handle, the budget cap if applicable, the
escalation thresholds.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Override interface.&lt;/strong&gt; How a human operator overrides
the agent. Named operators, audit trail, escalation
path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Incident history.&lt;/strong&gt; Every incident attributed to the
agent. Material for trend analysis and for governance
review.&lt;/p&gt;
&lt;h2&gt;Why the registry is the source of truth, not a copy&lt;/h2&gt;
&lt;p&gt;The single most important architectural decision is to
make the registry the source of truth. Not a copy of
data that lives elsewhere; the authoritative record
that the rest of the control plane reads from.&lt;/p&gt;
&lt;p&gt;When the policy engine makes an authorisation decision,
it reads from the agent registry, not from a
synchronised copy. When the audit log records an action,
it tags the action against the registry entry, not
against a denormalised label. When the override
interface acts, it acts against the registry entry, not
a copy.&lt;/p&gt;
&lt;p&gt;This design decision sounds technical but it is the
single most consequential governance decision. A
registry that is a copy is a record-keeping exercise. A
registry that is the source of truth is the operating
backbone.&lt;/p&gt;
&lt;h2&gt;The implementation pattern&lt;/h2&gt;
&lt;p&gt;Five components.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The registry data store.&lt;/strong&gt; Persistent, queryable,
versioned. Supports atomic updates. Audited.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The registry API.&lt;/strong&gt; Read and write access for the
control plane components, the operations team, and the
audit function. Authentication is non-negotiable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The change control workflow.&lt;/strong&gt; Updates to the registry
flow through a defined approval process. Some changes
are routine (incident logging); some require senior
management sign-off (authorisation policy changes, new
agents into production).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The synchronisation pattern.&lt;/strong&gt; Where downstream
systems need a local cache of registry data (for
performance reasons), the synchronisation is one-way
from the registry, and the local cache has a defined
freshness expectation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The audit and review function.&lt;/strong&gt; The registry contents
are reviewed periodically: monthly for lifecycle
transitions, quarterly for risk-classified changes,
annually for the full estate.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;The model registry is a regulatory obligation. The
agent registry will be, soon. Building either properly
costs less than building it poorly and recovering
later. The architecture function is typically the right
owner.&lt;/p&gt;
&lt;p&gt;For firms that have a thin model registry today, my
recommendation is to invest in making the existing
registry the source of truth before extending into the
agent registry. The two layers have different content
but share the same architectural pattern.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference architecture for agentic
AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/non-human-identity-ai-agents-ea-pattern/&quot;&gt;Non-human identity in the age of AI
agents&lt;/a&gt;,
&lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber guardrails for AI agents in regulated
workflows&lt;/a&gt;,
&lt;a href=&quot;/blog/architectural-fitness-functions-framework/&quot;&gt;Architectural fitness functions: a practical
framework&lt;/a&gt;,
&lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;Auditing agent decisions&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Non-human identity in the age of AI agents: an enterprise architecture pattern</title>
    <link href="https://tarun.bulchandanis.com/blog/non-human-identity-ai-agents-ea-pattern/"/>
    <id>https://tarun.bulchandanis.com/blog/non-human-identity-ai-agents-ea-pattern/</id>
    <updated>2026-09-17T00:00:00.000Z</updated>
    <published>2026-09-17T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>KPMG&#39;s &#39;invisible access, visible risk&#39; piece gestures at the problem. None of the major firms has published a defensible architectural pattern. The non-human identity problem is now the most under-addressed gap in enterprise identity architecture.</summary>
    <content type="html">&lt;p&gt;Non-human identity is the identity assigned to a system,
a service, an automated process or, increasingly, an AI
agent. The category has existed for decades; the
treatment in most enterprises has been informal. Service
accounts are created ad-hoc, shared across teams,
rotated rarely, and revoked when somebody remembers to
do so.&lt;/p&gt;
&lt;p&gt;The agentic shift has made this informal treatment
untenable. An AI agent is, structurally, a non-human
identity. It needs to authenticate, it needs to be
authorised against specific resources, it needs to be
auditable, and it needs to be revocable. The legacy
service-account treatment does not deliver any of these
reliably.&lt;/p&gt;
&lt;p&gt;This piece sets out the enterprise architecture pattern
I use for non-human identity in environments where AI
agents are deployed.&lt;/p&gt;
&lt;h2&gt;The legacy problem&lt;/h2&gt;
&lt;p&gt;Five legacy patterns recur in most enterprise estates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared credentials.&lt;/strong&gt; A service account whose password
or API key is shared across multiple systems and teams.
When the credential is compromised or needs rotation,
identifying the dependent systems requires investigation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Long-lived credentials.&lt;/strong&gt; API keys that have not been
rotated in years. The owner has moved on, the team has
restructured, and the keys still grant production access.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Over-privileged credentials.&lt;/strong&gt; A service account
created with broad access at the time of deployment
because narrowing the access was operationally
expensive. The broad access persists long after the
original need.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Undocumented credentials.&lt;/strong&gt; Service accounts that
exist in production but do not appear in any inventory.
The accounts are discovered during audit and the
ownership is contested.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Credentials without lifecycle.&lt;/strong&gt; No defined
creation process, no defined renewal cycle, no defined
revocation process. The credentials persist indefinitely
unless somebody actively removes them.&lt;/p&gt;
&lt;p&gt;Each of these is a discrete control failure. Together
they create the conditions in which an AI agent
deployment can quietly accumulate authority well beyond
its operating need.&lt;/p&gt;
&lt;h2&gt;The pattern for AI agents&lt;/h2&gt;
&lt;p&gt;Six components turn up in every working implementation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. The agent is a first-class identity.&lt;/strong&gt; The agent
has its own identity in the identity provider, not a
shared service account. The identity is created
deliberately, scoped explicitly, and lifecycle-managed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The identity carries metadata.&lt;/strong&gt; The agent
identity record includes: the agent&#39;s purpose, its
authorisation policy, its accountable owner, its model
configuration, its lifecycle status. This metadata is
the source of truth referenced by the rest of the
control plane.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. The credentials are short-lived.&lt;/strong&gt; The agent
authenticates using short-lived tokens (typically 15
minutes to a few hours) issued against the agent&#39;s
identity. Long-lived API keys are rejected as a control
pattern.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. The authorisation is policy-driven.&lt;/strong&gt; The
agent&#39;s access to data, tools and systems is enforced
by a policy engine that reads from the agent identity
record. Changes to authorisation flow through the
identity record, not through ad-hoc grant changes on
the dependent systems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. The lifecycle is automated.&lt;/strong&gt; Agent identity
creation follows a defined approval workflow. Renewal
is tied to the agent&#39;s lifecycle status. Revocation is
automated when the agent reaches end-of-life or is
flagged in incident response.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. The audit is end-to-end.&lt;/strong&gt; Every authentication,
every authorisation decision, every resource access is
logged against the agent identity. The audit trail is
retrievable for the regulatory retention period.&lt;/p&gt;
&lt;h2&gt;The implementation pattern&lt;/h2&gt;
&lt;p&gt;The pattern lands in the architecture in three places.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The identity provider.&lt;/strong&gt; The firm&#39;s existing IDP
(Entra ID, Okta, Auth0, etc.) extended to support
non-human principals with the metadata above. Most
mature IDPs support this; the work is in the
configuration and the discipline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The policy decision point.&lt;/strong&gt; A policy engine (OPA,
Cedar, a commercial equivalent) that consumes the agent
identity record and makes authorisation decisions. The
engine is auditable independently of the agent runtime.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The agent platform.&lt;/strong&gt; The agent runtime acquires
short-lived tokens via the IDP, presents them at the
policy decision point, and operates within the
authorised envelope. The agent does not hold long-lived
credentials.&lt;/p&gt;
&lt;h2&gt;The transition pattern&lt;/h2&gt;
&lt;p&gt;Most firms have legacy non-human identity that does not
conform to this pattern. The transition is operationally
heavy and usually staged.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stage 1: inventory.&lt;/strong&gt; Identify every non-human
identity in the estate. Most firms find more than
expected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stage 2: classification.&lt;/strong&gt; Classify each identity by
risk and by amenability to the new pattern. Some legacy
identities will be replaced; some will be retired; some
will be wrapped.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stage 3: new identities use the new pattern.&lt;/strong&gt; From
a defined date, every new non-human identity is created
under the new pattern. This includes every new AI
agent.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stage 4: high-risk legacy identities migrate.&lt;/strong&gt; The
identities with the highest access privileges and the
largest blast radius are migrated first.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stage 5: the long tail migrates over an extended
timeline.&lt;/strong&gt; Most firms will have legacy non-human
identities in the estate for years. The discipline is
to prevent the legacy pattern from being extended into
new use cases.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;The non-human identity problem is one of the most
under-managed control gaps in most enterprise estates.
The agentic shift has elevated it from a hygiene issue
to a material control. The architecture function is
typically the right owner of the migration to a
deliberate pattern.&lt;/p&gt;
&lt;p&gt;For firms doing this work in 2026, my recommendation is
to lock the new pattern for all new identities first,
then prioritise the migration of high-risk legacy
identities, and accept that the long tail will resolve
over multiple years.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/identity-first-security-perimeter/&quot;&gt;Identity-first security: rethinking
the enterprise
perimeter&lt;/a&gt;, &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A
reference architecture for agentic AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber guardrails for AI agents in regulated
workflows&lt;/a&gt;,
&lt;a href=&quot;/blog/model-agent-registries-governance-artefact/&quot;&gt;Model and agent registries: the missing governance
artefact&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>A reference architecture for agentic AI in the regulated enterprise</title>
    <link href="https://tarun.bulchandanis.com/blog/reference-architecture-agentic-ai-regulated/"/>
    <id>https://tarun.bulchandanis.com/blog/reference-architecture-agentic-ai-regulated/</id>
    <updated>2026-09-10T00:00:00.000Z</updated>
    <published>2026-09-10T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>McKinsey published &#39;rethinking enterprise architecture for the agentic era&#39; in early 2026. The piece sets the strategic direction. This is the working reference architecture I use when actually delivering against that direction inside a regulated enterprise.</summary>
    <content type="html">&lt;p&gt;McKinsey published &amp;quot;rethinking enterprise architecture
for the agentic era&amp;quot; earlier this year. The strategic
direction is right. The article does not, and could not,
go to the level of architectural specificity a practising
architect needs.&lt;/p&gt;
&lt;p&gt;This piece is the working reference architecture I use
when actually delivering this work in a regulated
enterprise. It is calibrated against my experience as
the architecture leader for an organisation operating in
multiple regulated jurisdictions, plus a couple of
production deployments I have shipped.&lt;/p&gt;
&lt;h2&gt;The five-layer reference&lt;/h2&gt;
&lt;p&gt;A working agentic AI architecture in a regulated firm
has five layers. Each is necessary; none is sufficient
on its own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 1: Foundation models.&lt;/strong&gt; The underlying models
(GPT-class, Claude-class, Gemini-class, plus
domain-specific models). The firm should treat this as a
substitutable layer. The architecture function&#39;s job is
to make it cleanly substitutable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 2: Model serving.&lt;/strong&gt; Where the models execute.
For regulated firms, this is rarely the model vendor&#39;s
public API in production; it is more often the model
vendor&#39;s enterprise tenant, a private deployment, or a
sovereign instance. The architecture choice here is
material and is covered separately in &lt;a href=&quot;/blog/data-residency-ai-workloads-uk-eu/&quot;&gt;Data residency
for AI workloads&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 3: Agent runtime.&lt;/strong&gt; The orchestration layer that
turns model calls into agent behaviour. Tool calling,
memory, multi-step reasoning, observability. The firm
should own this layer; outsourcing it to a single vendor
creates lock-in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 4: Tool gateway.&lt;/strong&gt; The mediated interface
between the agents and the firm&#39;s existing systems.
Policy enforcement, audit logging, rate limiting,
authentication. This is where MCP integration lands in
practice. See &lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most important enterprise
standard nobody is implementing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 5: Domain applications.&lt;/strong&gt; The agents themselves,
calibrated to specific business processes. Each agent
has a defined purpose, a defined authority boundary, a
defined operating envelope. This is where the value is
captured and where the use case diversity lives.&lt;/p&gt;
&lt;h2&gt;The platform components&lt;/h2&gt;
&lt;p&gt;Across the five layers, six platform components recur in
every deployment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The agent registry.&lt;/strong&gt; Single canonical record of every
agent in the firm. Purpose, owner, authorisation policy,
model and prompt configuration, lifecycle status,
deployment environments, accountable senior manager.
The registry is the source of truth; nothing operates in
production without an entry.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The policy engine.&lt;/strong&gt; Authorisation decisions are made
by the policy engine. Agents request access; the engine
allows or denies. The engine reads from the agent
registry and from the firm&#39;s broader authorisation
policy. Auditable independently of the agent.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The audit log.&lt;/strong&gt; Immutable record of every action
taken by every agent. Tamper-evident, queryable,
retained for the regulatory retention period. The audit
log is the firm&#39;s primary evidence in the event of an
incident or a regulatory query.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The override interface.&lt;/strong&gt; Human operators can override
agent decisions. The override is logged as a
first-class event, attributed to the named operator,
and reviewed periodically for systemic patterns.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The monitoring and observability layer.&lt;/strong&gt; Real-time
visibility into agent behaviour. Anomaly detection.
Performance monitoring. SLA compliance. Cost
attribution.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The incident workflow.&lt;/strong&gt; When something goes wrong
(an unexpected tool call, a quality threshold breach, a
customer complaint that traces back to agent
behaviour), the incident workflow notifies, captures
and resolves.&lt;/p&gt;
&lt;h2&gt;The non-negotiable design principles&lt;/h2&gt;
&lt;p&gt;Four principles in the reference architecture I will
not compromise on.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. The agent has no authority that has not been
explicitly granted.&lt;/strong&gt; Default deny. Every action the
agent can take is enumerable from the agent registry.
If the registry says no, the policy engine says no.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The audit log is immutable and is generated by the
platform, not the agent.&lt;/strong&gt; The agent cannot decide what
to log. The platform observes and logs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. The override path is documented and tested.&lt;/strong&gt; A
human operator can stop an agent within an explicit SLA.
The path is exercised in production-like conditions
regularly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. The accountability is named.&lt;/strong&gt; Each agent has a
named senior manager. SMCR or equivalent already
requires this in regulated firms; the architecture
reinforces it.&lt;/p&gt;
&lt;h2&gt;The trade-offs the reference does not resolve&lt;/h2&gt;
&lt;p&gt;Three trade-offs are firm-specific and the reference
intentionally leaves them open.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Build vs buy on the agent runtime.&lt;/strong&gt; A bespoke runtime
gives full control and full operating cost. A vendor
runtime gives faster time-to-value and a vendor
dependency. The right answer depends on the firm&#39;s
existing engineering capacity, the strategic importance
of the agent capability, and the firm&#39;s vendor risk
posture.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Centralised vs federated platform ownership.&lt;/strong&gt; A
centralised platform team owns the agent platform and
serves the use case teams. A federated model gives each
use case team its own platform stack with central
standards. The trade-off is consistency vs autonomy.
Most regulated firms should start centralised and
relax over time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MCP-native vs custom integration.&lt;/strong&gt; As covered above
and in &lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;the MCP piece&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;The reference architecture above is what I have seen
work in practice. It is not the only configuration that
works; it is the one I have the most confidence in for
regulated environments.&lt;/p&gt;
&lt;p&gt;For firms starting this work in 2026, my recommendation
is to invest in the platform components (Layer 4 and
the registry, policy engine, audit log) before the
first agent goes into production. Building the platform
afterwards is significantly more expensive than
building it first.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;Auditing agent
decisions&lt;/a&gt;, &lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber
guardrails for AI agents in regulated
workflows&lt;/a&gt;,
&lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most important enterprise standard nobody is
implementing&lt;/a&gt;, &lt;a href=&quot;/blog/uk-fs-regulation-ai-architecture-2026/&quot;&gt;What UK
financial services regulation means for AI architecture
in 2026&lt;/a&gt;,
&lt;a href=&quot;/blog/architectural-fitness-functions-framework/&quot;&gt;Architectural fitness functions: a practical
framework&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Banking and financial services architecture: top trends 2026</title>
    <link href="https://tarun.bulchandanis.com/blog/banking-fs-architecture-top-trends-2026/"/>
    <id>https://tarun.bulchandanis.com/blog/banking-fs-architecture-top-trends-2026/</id>
    <updated>2026-09-03T00:00:00.000Z</updated>
    <published>2026-09-03T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Capgemini publishes &#39;banking top trends 2026&#39; and &#39;financial services top trends 2026&#39;. Both are written from the advisor&#39;s vantage. The architecture function&#39;s read of the same operating environment is different and more specific. Seven trends from inside the function.</summary>
    <content type="html">&lt;p&gt;Capgemini publishes its top-trends series for banking,
insurance and financial services each year. BCG runs its
AI Radar. PwC publishes its UK financial services
regulatory commentary. McKinsey publishes its banking
agentic AI work. The set of pieces is consistent and
broadly agrees on the strategic agenda.&lt;/p&gt;
&lt;p&gt;The architecture function&#39;s read of the same operating
environment is different. The advisor writes for the
chief executive, the chief risk officer, the chief
financial officer. The architect writes for the people
who have to deliver the systems that will make any of
this real.&lt;/p&gt;
&lt;p&gt;This piece is the architect&#39;s view of the 2026 agenda.&lt;/p&gt;
&lt;h2&gt;Trend 1: regulatory engagement shifts upstream&lt;/h2&gt;
&lt;p&gt;The FCA, PRA and EBA have moved noticeably faster on AI,
operational resilience and outsourcing in the last
eighteen months. The architecture function&#39;s involvement
in regulatory engagement is moving from &amp;quot;after-the-fact
review&amp;quot; to &amp;quot;design-phase consultation&amp;quot;. Firms that have
not made this shift are paying for it through extended
implementation timelines.&lt;/p&gt;
&lt;p&gt;What to do: embed the regulatory function into the
architecture review process, not the other way around.&lt;/p&gt;
&lt;h2&gt;Trend 2: agentic AI moves from pilot to production&lt;/h2&gt;
&lt;p&gt;The pilot programmes of 2024 and 2025 are now production
in 2026. The production environment surfaces problems
the pilot environment did not: scaling cost, audit trail
discipline, vendor lock-in, change control. The
architecture function carries more of this load than the
2024 sales pitch implied.&lt;/p&gt;
&lt;p&gt;What to do: budget for steady-state operating cost of
agent infrastructure (registry, policy engine, audit
trail, override interface) before scaling beyond the
pilot footprint.&lt;/p&gt;
&lt;h2&gt;Trend 3: the core banking and policy admin platform&lt;/h2&gt;
&lt;p&gt;modernisation cycle is accelerating&lt;/p&gt;
&lt;p&gt;The legacy mainframe estate that survived the last
modernisation cycle (2010s) is now under genuine
pressure. The agentic shift has changed the integration
demands; the regulatory shift has changed the
data-residency demands; the cost-base pressure has
changed the executive appetite for the modernisation
programme.&lt;/p&gt;
&lt;p&gt;What to do: separate the modernisation business case
from the AI investment business case. Treat them as
sequenced rather than combined. The combined business
case is too brittle to defend through the inevitable
re-baselining cycles.&lt;/p&gt;
&lt;h2&gt;Trend 4: the architecture function shifts from&lt;/h2&gt;
&lt;p&gt;&amp;quot;reviewer&amp;quot; to &amp;quot;delivery owner&amp;quot; on agent capabilities&lt;/p&gt;
&lt;p&gt;In 2024, the architecture function reviewed the agent
deployments after the AI function had built them. In
2026, the architecture function in well-run firms owns
the agent platform: registry, policy engine, tool
gateway. The AI function owns the use cases on top.&lt;/p&gt;
&lt;p&gt;What to do: clarify the ownership boundary explicitly.
If the architecture function does not own the platform,
the firm will have multiple bespoke implementations
within twelve months.&lt;/p&gt;
&lt;h2&gt;Trend 5: MCP and equivalent standards become&lt;/h2&gt;
&lt;p&gt;material to vendor selection&lt;/p&gt;
&lt;p&gt;Foundation model providers, vertical AI tools and SaaS
platforms with embedded AI are being assessed against
their interoperability with MCP and equivalent
standards. The firms that get this right preserve
optionality; the firms that do not are committing to
vendor-specific integration that becomes expensive to
unwind.&lt;/p&gt;
&lt;p&gt;What to do: include interoperability standards
compliance in the vendor assessment criteria. See &lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP
is the most important enterprise standard nobody is
implementing&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Trend 6: data residency and sovereignty become&lt;/h2&gt;
&lt;p&gt;architecture decisions, not just policy decisions&lt;/p&gt;
&lt;p&gt;The data residency requirements for financial services
have firmed up in both the UK and EU. The architecture
function has to design for residency, not just declare
it in policy. The implications cascade through model
hosting, vector database location, audit trail storage
and recovery infrastructure.&lt;/p&gt;
&lt;p&gt;What to do: model the residency requirements at the
data-flow level, not just at the data-class level.&lt;/p&gt;
&lt;h2&gt;Trend 7: model risk management becomes a&lt;/h2&gt;
&lt;p&gt;steady-state discipline&lt;/p&gt;
&lt;p&gt;SS1/23 and the equivalent EBA guidance now apply to AI
models in production. Model risk management is no
longer a project-phase concern; it is a steady-state
operating discipline. The architecture function carries
significant weight in maintaining the model inventory,
the validation evidence and the performance monitoring.&lt;/p&gt;
&lt;p&gt;What to do: fund the model risk management function
properly. The model inventory needs ongoing engineering
support, not just policy support.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;The 2026 agenda is more operationally weighty than the
2024 agenda. The pieces that worked as pilots have to
work as production. The pieces that worked at the
strategic narrative layer have to work at the systems
layer.&lt;/p&gt;
&lt;p&gt;The firms that will land 2026 well are the ones whose
architecture function has the seniority, the funding
and the authority to carry this. The firms where the
architecture function is a service provider to other
functions will struggle.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/uk-fs-regulation-ai-architecture-2026/&quot;&gt;What UK financial services regulation
means for AI architecture in
2026&lt;/a&gt;, &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A
reference architecture for agentic AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber guardrails for AI agents in regulated
workflows&lt;/a&gt;,
&lt;a href=&quot;/blog/cio-ai-agenda-2026/&quot;&gt;The CIO&#39;s AI agenda for 2026: an architect&#39;s
read&lt;/a&gt;, &lt;a href=&quot;/blog/top-trends-enterprise-architecture-2026/&quot;&gt;Top trends in
enterprise architecture
2026&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>How AI is reshaping the compliance function: an architect&#39;s view</title>
    <link href="https://tarun.bulchandanis.com/blog/ai-compliance-function-architects-view/"/>
    <id>https://tarun.bulchandanis.com/blog/ai-compliance-function-architects-view/</id>
    <updated>2026-08-27T00:00:00.000Z</updated>
    <published>2026-08-27T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>KPMG&#39;s &#39;AI is poised to reshape compliance functions&#39; piece sketches the strategic direction. The architecture function&#39;s read is more specific: which compliance workflows are amenable to agent support, which require explicit guardrails, and where the audit-trail design choices land.</summary>
    <content type="html">&lt;p&gt;KPMG published &amp;quot;how AI is poised to reshape compliance
functions&amp;quot; earlier this year. The piece argues that the
compliance function is one of the highest-value
candidates for agentic AI augmentation: high volume of
structured work, clear rules, auditable outcomes, and
acute pressure on cost.&lt;/p&gt;
&lt;p&gt;The argument is correct at the level the article
operates. The architecture function&#39;s read of the same
material is more specific: which compliance workflows
are genuinely amenable to agent support, which require
explicit guardrails, and where the audit-trail design
choices land.&lt;/p&gt;
&lt;h2&gt;The workflows that are obvious candidates&lt;/h2&gt;
&lt;p&gt;Three workflow shapes show up well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Document review against a policy.&lt;/strong&gt; The agent reads a
document (a contract, a marketing claim, a customer
communication, a transaction record), compares it
against a defined policy, and flags compliance issues.
The agent&#39;s output is a recommendation; the human
compliance officer signs off. This shape is broadly
mature; multiple production deployments exist.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Filing preparation.&lt;/strong&gt; The agent assembles a regulatory
filing from underlying data sources, formats it
according to the regulator&#39;s published rules, and
prepares it for human review. The human reviews,
adjusts where needed, and submits. Most of the heavy
lift sits in the data assembly and the format
compliance; the agent does well on both.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Customer complaint triage.&lt;/strong&gt; The agent reads a
customer complaint, classifies it against the firm&#39;s
complaint taxonomy, routes it to the appropriate
handler and drafts an initial response. The human
handler reviews the draft and sends. This shape is
operationally mature in retail financial services.&lt;/p&gt;
&lt;h2&gt;The workflows that require explicit guardrails&lt;/h2&gt;
&lt;p&gt;Three workflow shapes are more delicate.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Final-decision workflows.&lt;/strong&gt; Where the compliance
function makes a final decision (a suspicious activity
report determination, a sanctions screening match
adjudication, a regulatory breach finding), the agent
should support but not decide. The architecture has to
make this distinction explicit: the human signs the
decision, the agent provides the reasoning trail, and
the audit log records both.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Investigation workflows.&lt;/strong&gt; The agent assembles
material relevant to a compliance investigation. The
risk is that the agent&#39;s selection biases the
investigation. The mitigation is in the audit log: the
investigator can see exactly what the agent considered
and what it did not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cross-customer pattern detection.&lt;/strong&gt; Where the agent
operates across customer data (transaction monitoring,
market abuse detection), the data residency and access
controls become more demanding. The architecture has to
respect the data segregation the firm has committed to
in its regulatory filings.&lt;/p&gt;
&lt;h2&gt;The workflows the architecture function should resist&lt;/h2&gt;
&lt;p&gt;Two workflow shapes I would currently keep agents out
of in regulated firms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sanctions list matching.&lt;/strong&gt; The downside of a false
negative is large; the matching rules are precise; the
existing systems already perform well. The marginal
value of an agent layer is small and the risk of
introducing soft errors is real. Stay with
rule-based systems with human review on close matches.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Senior management attestation.&lt;/strong&gt; Where SMCR or
equivalent regulation requires named senior manager
attestation, an agent should not be drafting that
attestation. The attestation is the named manager&#39;s
direct statement; tooling can support the data
gathering but should not draft the statement itself.&lt;/p&gt;
&lt;h2&gt;The audit-trail problem&lt;/h2&gt;
&lt;p&gt;The single largest architecture decision is the audit
trail. In a compliance workflow, the trail has to
support three audiences over the regulatory retention
period:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The internal audit function, reviewing periodically&lt;/li&gt;
&lt;li&gt;The external auditor, reviewing annually&lt;/li&gt;
&lt;li&gt;The regulator, reviewing on inspection or after an
incident&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The trail has to capture what the agent saw, what it
recommended, what the human reviewed, what the human
decided, and any divergence between the recommendation
and the decision. The retention period is typically
five to seven years and may extend to ten in some
regulatory contexts.&lt;/p&gt;
&lt;p&gt;Most existing systems do not log at this granularity.
The architecture function has to specify the logging
contract before deployment, and the operational
discipline to maintain it has to be funded.&lt;/p&gt;
&lt;h2&gt;The operating model implication&lt;/h2&gt;
&lt;p&gt;A working AI-augmented compliance function has three
roles the firm may not currently have.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent operator.&lt;/strong&gt; The human in the loop. Reviews
recommendations, decides outcomes, captures rationale.
This is an evolved compliance officer role, not a new
one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent supervisor.&lt;/strong&gt; Reviews agent performance,
identifies systematic errors, manages the model and
prompt configuration. This is closer to a quant role
than a compliance role; the firm has to source it
carefully.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Accountable senior manager.&lt;/strong&gt; SMCR or equivalent
already requires this. The named senior manager has to
have visibility of how the AI-augmented workflows
operate and has to be able to defend the design
choices in front of the regulator.&lt;/p&gt;
&lt;p&gt;The architecture function should be designing the
operating model alongside the technical architecture,
not afterwards.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;AI in compliance is a real opportunity. The architecture
choices determine whether it lands as a productivity
gain or as a regulatory exposure. The firms that get
this right are the ones where the architecture function
treats the compliance use case with the same rigour as
any other regulated workflow: explicit guardrails,
defensible audit trails, named accountabilities and
deliberate operating model design.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;Auditing agent
decisions&lt;/a&gt;, &lt;a href=&quot;/blog/cyber-guardrails-ai-agents-regulated/&quot;&gt;Cyber
guardrails for AI agents in regulated
workflows&lt;/a&gt;,
&lt;a href=&quot;/blog/uk-fs-regulation-ai-architecture-2026/&quot;&gt;What UK financial services regulation means for AI
architecture in
2026&lt;/a&gt;,
&lt;a href=&quot;/blog/non-human-identity-ai-agents-ea-pattern/&quot;&gt;Non-human identity in the age of AI
agents&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Cyber guardrails for AI agents in regulated workflows: a reference architecture</title>
    <link href="https://tarun.bulchandanis.com/blog/cyber-guardrails-ai-agents-regulated/"/>
    <id>https://tarun.bulchandanis.com/blog/cyber-guardrails-ai-agents-regulated/</id>
    <updated>2026-08-20T00:00:00.000Z</updated>
    <published>2026-08-20T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>EY&#39;s &#39;reimagine your cyber guardrails to accelerate AI value&#39; piece sets the right strategic direction. The architecture function still has to translate that into specific controls. A practitioner&#39;s reference architecture for AI agents operating inside regulated workflows.</summary>
    <content type="html">&lt;p&gt;EY published &amp;quot;reimagine your cyber guardrails to
accelerate AI value&amp;quot; in early 2026. The piece argues
that conventional cyber controls were designed for
human-driven workflows and need adaptation for the
agentic era. The strategic argument is correct. The
architecture function still has to translate that into
specific controls.&lt;/p&gt;
&lt;p&gt;This piece sets out the reference architecture I use for
AI agents operating inside regulated workflows. The
focus is on the layer where the architecture function
actually has design choice: not the model itself, not
the business process, but the control surface in
between.&lt;/p&gt;
&lt;h2&gt;The four guardrail categories&lt;/h2&gt;
&lt;p&gt;Every agent in a regulated workflow needs controls in
four categories. The categories are not independent;
they reinforce each other and have to be designed
together.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Identity guardrails.&lt;/strong&gt; The agent has to have a
distinct identity, separate from the human operator
who configured it or the customer it acts for. The
identity has to be auditable, has to support
authorisation policy, and has to support revocation. See
&lt;a href=&quot;/blog/non-human-identity-ai-agents-ea-pattern/&quot;&gt;Non-human identity in the age of AI
agents&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Authority guardrails.&lt;/strong&gt; The agent&#39;s authority has
to be bounded. It can read from a defined set of data
sources. It can call a defined set of APIs. It can
write to a defined set of systems. It can spend a
defined budget. Each of these has to be explicit in the
authorisation policy and enforced at the runtime
boundary, not just at the agent configuration layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Observation guardrails.&lt;/strong&gt; Every action the agent
takes has to be observable. The observation has to be
sufficient to reconstruct the agent&#39;s reasoning, not
just its output. This is where the audit trail design
sits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Reversal guardrails.&lt;/strong&gt; Where the agent&#39;s actions
can be reversed (financial transactions, customer
communications, system changes), the reversal path has
to be designed alongside the forward path. Where the
actions cannot be reversed (cryptographic operations,
external API calls, regulatory submissions), the agent
should not be allowed to take them without explicit
human-in-the-loop confirmation.&lt;/p&gt;
&lt;h2&gt;The runtime architecture&lt;/h2&gt;
&lt;p&gt;Six components turn up in every working implementation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent registry.&lt;/strong&gt; Each agent has a record. The record
includes the agent&#39;s purpose, its authorisation policy,
its accountable senior manager, its model and prompt
configuration, its lifecycle status, and its incident
history.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Policy engine.&lt;/strong&gt; Authorisation decisions are made by
the policy engine, not by the agent itself. The agent
makes a request; the policy engine returns allow or
deny. The policy engine is auditable independently of
the agent.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool gateway.&lt;/strong&gt; Agents do not call tools directly.
They call the tool gateway, which enforces the policy,
logs the call and forwards to the underlying tool if
allowed. This is where MCP integrations land in
practice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Audit log.&lt;/strong&gt; Every action, every decision, every tool
call lands in an immutable audit log. The log is
queryable, retrievable for the regulatory retention
period, and tamper-evident.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Override interface.&lt;/strong&gt; Human operators can override
agent decisions. The override is logged, named, and
auditable as a first-class event.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Incident workflow.&lt;/strong&gt; When an agent does something
unexpected (a tool call denied, an unusual reasoning
trace, a quality threshold breach), the incident
workflow notifies the accountable senior manager and
captures the root cause.&lt;/p&gt;
&lt;h2&gt;The threat model&lt;/h2&gt;
&lt;p&gt;Three threats matter and have to be designed against.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prompt injection.&lt;/strong&gt; An agent reading customer-provided
content (an email, a document, a chat message) can be
manipulated by carefully crafted content into taking
actions the operator did not intend. The mitigation is
in the authority guardrail (the agent does not have
authority to do dangerous things in the first place) and
in the observation guardrail (unusual actions trigger
review before completion).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool confusion.&lt;/strong&gt; An agent in a complex environment
with many tools can call the wrong tool against the
wrong data. The mitigation is in the policy engine
(strict scope enforcement) and in the tool gateway
(per-tool monitoring for anomalous call patterns).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cascading agent calls.&lt;/strong&gt; Agents calling other agents
can create dependency chains that are hard to audit.
The mitigation is in the audit log (the full chain has
to be reconstructable) and in the policy engine (chain
depth is bounded).&lt;/p&gt;
&lt;h2&gt;Where this lands in delivery&lt;/h2&gt;
&lt;p&gt;A reference architecture is only useful if it can be
delivered. The architectures I see working in practice
share three characteristics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The architecture is platformised.&lt;/strong&gt; The agent
registry, policy engine, tool gateway and audit log are
shared services across the firm&#39;s agents, not bespoke
to each use case. The cost-to-build of the first agent
is high; the cost-to-build of the tenth agent is
modest.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The threat model is documented.&lt;/strong&gt; The threats above
and the firm-specific additions are explicit. Each
guardrail control is mapped to the threats it
addresses. The mapping is reviewed periodically.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The accountability is named.&lt;/strong&gt; Each agent has an
accountable senior manager. The SMCR framework already
requires this for regulated UK firms; the architecture
should reinforce it rather than work around it.&lt;/p&gt;
&lt;h2&gt;Related work&lt;/h2&gt;
&lt;p&gt;For more on the broader operating model, see &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A
reference architecture for agentic AI in the regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;Auditing agent decisions&lt;/a&gt;,
&lt;a href=&quot;/blog/identity-first-security-perimeter/&quot;&gt;Identity-first security: rethinking the enterprise
perimeter&lt;/a&gt;,
&lt;a href=&quot;/blog/ai-compliance-function-architects-view/&quot;&gt;How AI is reshaping the compliance function: an
architect&#39;s view&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Agentic commerce: the integration architecture nobody is talking about</title>
    <link href="https://tarun.bulchandanis.com/blog/agentic-commerce-integration-architecture/"/>
    <id>https://tarun.bulchandanis.com/blog/agentic-commerce-integration-architecture/</id>
    <updated>2026-08-13T00:00:00.000Z</updated>
    <published>2026-08-13T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Accenture has published &#39;the dawn of the agentic deal&#39; and &#39;is your company ready for agentic commerce&#39;. PwC has &#39;real change agents&#39;. BCG has a $200B agentic AI opportunity piece. Every framing is at the value layer. The integration architecture underneath is where the real work sits.</summary>
    <content type="html">&lt;p&gt;The consultancy discourse on agentic commerce has, in
the last twelve months, settled into a familiar shape.
Accenture&#39;s &amp;quot;dawn of the agentic deal&amp;quot; and &amp;quot;agentic
commerce&amp;quot; pieces. PwC&#39;s &amp;quot;real change agents&amp;quot;. BCG&#39;s
$200 billion agentic AI opportunity for technology
service providers. McKinsey&#39;s banking and marketing
workflow pieces. Every framing is at the value layer:
what the agents will do, what the business model looks
like, how the firm captures the upside.&lt;/p&gt;
&lt;p&gt;The integration architecture underneath this is where
the actual work sits. That layer is conspicuously absent
from the public commentary.&lt;/p&gt;
&lt;p&gt;This piece is for the architects who have to build it.&lt;/p&gt;
&lt;h2&gt;What agentic commerce actually requires&lt;/h2&gt;
&lt;p&gt;The shape is straightforward once you specify it. An
agentic commerce flow involves an agent (either acting
for the customer or acting for the firm) that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Discovers what is available&lt;/li&gt;
&lt;li&gt;Negotiates terms&lt;/li&gt;
&lt;li&gt;Confirms intent&lt;/li&gt;
&lt;li&gt;Triggers fulfilment&lt;/li&gt;
&lt;li&gt;Settles payment&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each step has to interact with at least one back-office
system. In a regulated firm, each step also has to leave
an audit trail, has to respect customer consent, has to
support reversibility, and has to be observable in
production.&lt;/p&gt;
&lt;p&gt;Most enterprise estates are not built to support this
flow.&lt;/p&gt;
&lt;h2&gt;The five integration layers&lt;/h2&gt;
&lt;p&gt;Five layers turn up in every working agentic commerce
implementation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. The catalogue layer.&lt;/strong&gt; Whatever the agent
discovers, it needs structured access to. For most
firms, this means an MCP-compliant catalogue server that
exposes products, prices, availability, terms and
constraints in a format agents can read directly. The
existing e-commerce APIs are usually not sufficient; they
are designed for browser-driven UI, not agent-driven
exploration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The negotiation layer.&lt;/strong&gt; Where the agent has
authority to negotiate (volume discounts, structured
terms, custom payment arrangements), the negotiation has
to be bounded. The boundaries are commercial decisions
the firm has to make explicitly and that the architecture
has to enforce. An agent that can offer arbitrary
discounts will, eventually, offer arbitrary discounts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. The consent and confirmation layer.&lt;/strong&gt; Before the
transaction commits, the customer (or the customer&#39;s
agent, if delegated) has to have confirmed. The
confirmation has to be cryptographically auditable, has
to be linked to the specific transaction, and has to be
retrievable on demand for the regulatory retention
period.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. The fulfilment layer.&lt;/strong&gt; The transaction triggers
back-office processes. For physical goods, this means
the inventory and logistics systems. For services, this
means the service-provisioning systems. For financial
products, this means the booking and settlement systems.
Each of these has to accept agent-originated requests
and treat them with the same rigour as human-originated
requests.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. The settlement layer.&lt;/strong&gt; Payment has to settle. For
regulated firms, this includes KYC and AML checks even
where the customer is an existing customer. The
settlement has to be reversible during the regulatory
window for dispute, and the reversal has to flow back
through all four upstream layers cleanly.&lt;/p&gt;
&lt;h2&gt;Where the architecture choices land&lt;/h2&gt;
&lt;p&gt;Three choices determine whether the implementation works
or breaks under load.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Catalogue: MCP or custom?&lt;/strong&gt; The MCP standard is
maturing. A firm building a custom catalogue interface
for agents is building a bespoke layer that will need
revisiting in 12-18 months. A firm building against MCP
is betting on a standard that may or may not stick. The
defensible position, in my view, is to expose both: a
custom interface for the firm&#39;s own agent stack and an
MCP-compliant interface for third-party agents.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Negotiation: in-band or out-of-band?&lt;/strong&gt; A negotiation
that happens inside the catalogue interaction is
operationally cleaner but commercially more constrained.
A negotiation that happens out-of-band (the agent
contacts a sales workflow that may include human
intervention) is commercially more flexible but
operationally harder to audit. The choice depends on the
firm&#39;s product mix and the regulatory envelope.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Settlement: same-rails or new-rails?&lt;/strong&gt; Settling agent
transactions on the same rails as human transactions
keeps the back office consistent but inherits the
back-office&#39;s existing constraints. Building new rails
(agent-specific settlement) keeps the agent flow clean
but creates a second class of transaction that has to
be reconciled. Most firms should default to same-rails
with explicit agent-flag instrumentation.&lt;/p&gt;
&lt;h2&gt;The audit trail problem&lt;/h2&gt;
&lt;p&gt;In a regulated firm, every step of an agentic commerce
flow has to be auditable. The trail has to capture:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What the agent saw at each step (catalogue contents,
terms, availability)&lt;/li&gt;
&lt;li&gt;What the agent recommended or attempted&lt;/li&gt;
&lt;li&gt;What the customer or principal confirmed&lt;/li&gt;
&lt;li&gt;What the back-office systems received and processed&lt;/li&gt;
&lt;li&gt;What the settlement layer cleared&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most existing systems do not log at the granularity this
requires. The architecture function has to specify the
logging contract before deployment, not retrofit it
afterwards.&lt;/p&gt;
&lt;h2&gt;What this looks like in practice&lt;/h2&gt;
&lt;p&gt;I have been building a related system (CANVAS, the
internal application and vendor approval workflow at
Sonnedix) over the last 18 months. It is not exactly
agentic commerce but it shares the architectural shape:
agent-mediated decisions with full audit trails and
reversibility. See &lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;the CANVAS case
study&lt;/a&gt; for the underlying
patterns.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;Agentic commerce is operationally heavier than the
consultancy framing implies. The value capture is real;
the integration work to get there is non-trivial; the
architecture choices have multi-year consequences.&lt;/p&gt;
&lt;p&gt;For firms that are starting this work in 2026, my
recommendation is to invest in the catalogue layer first
(MCP-compliant where possible), to build the audit trail
contract before the first agent goes into production,
and to treat the negotiation layer as a deliberate
commercial decision rather than a default vendor
configuration.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most important enterprise
standard nobody is
implementing&lt;/a&gt;,
&lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;Auditing agent decisions&lt;/a&gt;,
&lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;CANVAS: the approval workflow no commercial product
covers&lt;/a&gt;, &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference
architecture for agentic AI in the regulated
enterprise&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>S/4HANA in the agentic era: where the enterprise architecture function sits</title>
    <link href="https://tarun.bulchandanis.com/blog/s4hana-agentic-era/"/>
    <id>https://tarun.bulchandanis.com/blog/s4hana-agentic-era/</id>
    <updated>2026-08-06T00:00:00.000Z</updated>
    <published>2026-08-06T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>McKinsey&#39;s S/4HANA alliance, BCG&#39;s partnership with Conduct, Accenture&#39;s SAP practice, EY&#39;s S/4HANA transformation work. Everyone is publishing on ERP modernisation in the agentic era from the advisor&#39;s vantage. The architecture function&#39;s read of the same shift is different and operationally more specific.</summary>
    <content type="html">&lt;p&gt;The major consultancies have, over the last twelve
months, all converged on the same theme. McKinsey
formalised its SAP alliance and ran the Value Finder
work; BCG announced its Conduct partnership in May 2026
specifically targeted at AI-driven ERP transformation;
Accenture continues to run the largest SAP practice in
the world; EY published the &amp;quot;S/4HANA transformation
success: the human factor&amp;quot; piece a few months back.&lt;/p&gt;
&lt;p&gt;The collective message: ERP transformation is being
reshaped by the agentic AI shift, and the firms that get
this right will do meaningfully better than the firms
that treat the AI layer and the ERP layer as separate
programmes.&lt;/p&gt;
&lt;p&gt;The architecture function&#39;s read of the same shift is
different. This piece sets out where the EA function sits
when an S/4HANA programme has to land in an agentic
environment.&lt;/p&gt;
&lt;h2&gt;What the consultancy framing gets right&lt;/h2&gt;
&lt;p&gt;Two things, in my reading.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The integration layer is the leverage point.&lt;/strong&gt; The
single biggest mistake in a legacy ERP estate is to
treat the ERP as a black box that exposes a small set of
interfaces and otherwise stays untouched. The agentic
shift makes this position untenable: agents need to
read from and write to the ERP in ways the original
integration design never anticipated. The firms that
will deliver the value are the ones that re-architect
the integration layer deliberately.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The data layer is the second leverage point.&lt;/strong&gt; S/4HANA
moves the firm to a single in-memory data platform. The
firms that deliver the value treat this as the
foundation for the analytical layer, not just the
operational one. The agentic shift compounds the value:
agents reading from a clean canonical data layer
operate noticeably better than agents reading from a
fragmented one.&lt;/p&gt;
&lt;h2&gt;What the consultancy framing misses&lt;/h2&gt;
&lt;p&gt;Three things, in my experience having delivered the
architecture of a CHF 350M+ S/4HANA programme.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The change control layer is structurally
underweighted.&lt;/strong&gt; Most S/4HANA programmes treat change
control as a project-phase concern. In the agentic era,
change control becomes a steady-state concern: every
model update, every agent capability change, every
integration change has to flow through governance that
accounts for the AI-specific risks. The firms that will
get this right are building this discipline into the
operating model from day one, not retrofitting it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The exit path is rarely modelled.&lt;/strong&gt; A firm that
deploys agents against S/4HANA has, in practice,
chosen a coupling between the SAP estate, the chosen
model provider and the chosen agent framework. The cost
of switching any of those after the fact is non-trivial.
The architecture function should be modelling the exit
paths during design, not after.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The lights-on cost of the agentic layer is missing.&lt;/strong&gt;
The consultancy commentary covers the value capture
from agents; it is largely silent on the operating
cost. In a regulated firm, the agentic layer adds:
ongoing model inventory maintenance, ongoing prompt
governance, ongoing exit path testing, ongoing model
performance monitoring, ongoing audit trail review.
These are non-trivial steady-state costs the EA function
has to budget for.&lt;/p&gt;
&lt;h2&gt;Where the EA function sits in the programme&lt;/h2&gt;
&lt;p&gt;Four explicit responsibilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. The integration architecture.&lt;/strong&gt; The EA function
owns the design choice between embedded SAP agents,
agents that call into S/4HANA via the SAP-published
APIs, and agents that read from a derived data layer.
The trade-offs are real and they affect the operating
model for years.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The data model.&lt;/strong&gt; The S/4HANA data model becomes
the operational data model. The EA function owns the
question of what gets canonical status, what gets
derived, what gets duplicated and what gets archived.
The agentic layer compounds the importance of this
choice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. The agent capability boundary.&lt;/strong&gt; Which business
processes does the firm allow agents to operate
against? Which require human-in-the-loop? Which are
agent-blocked entirely? The EA function should be
authoring this, not inheriting it from the AI vendor&#39;s
default configuration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. The vendor lock-in posture.&lt;/strong&gt; The S/4HANA programme
locks the firm to SAP. The agentic layer can either
compound that lock-in or partially offset it depending
on where the integration sits and which standards the
agentic layer uses (MCP being the most relevant). The
EA function should be running this trade-off explicitly.&lt;/p&gt;
&lt;h2&gt;What the right operating model looks like&lt;/h2&gt;
&lt;p&gt;The S/4HANA programmes that land well in the agentic era
share three characteristics.&lt;/p&gt;
&lt;p&gt;A small architecture cell embedded in the programme,
with explicit authority over the integration, data and
agent boundary decisions. Not a steering committee; a
working group with delivery authority.&lt;/p&gt;
&lt;p&gt;A documented architecture position that the programme
delivers against. Updated periodically, but not
re-litigated continuously. Most programmes underweight
this.&lt;/p&gt;
&lt;p&gt;A regulatory and risk function that engages with the
architecture position rather than reviewing it at end
of phase. The agentic layer makes after-the-fact review
materially worse than concurrent engagement.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;ERP transformation in the agentic era is a meaningfully
different programme from ERP transformation five years
ago. The EA function has more weight to carry; the
integration and data choices have larger downstream
implications; the steady-state cost of governance is
higher.&lt;/p&gt;
&lt;p&gt;For the firms doing this work now, my recommendation is
to invest disproportionately in the architecture cell
during the design phase, document the position
carefully, and budget for the steady-state cost of
agentic governance from day one.&lt;/p&gt;
&lt;p&gt;Related reading: &lt;a href=&quot;/blog/large-scale-erp-transformation-lessons/&quot;&gt;Lessons from large-scale ERP
transformation&lt;/a&gt;,
&lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference architecture for agentic AI in the
regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most important enterprise standard nobody is
implementing&lt;/a&gt;,
&lt;a href=&quot;/blog/architectural-fitness-functions-framework/&quot;&gt;Architectural fitness functions: a practical
framework&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>What UK financial services regulation means for AI architecture in 2026</title>
    <link href="https://tarun.bulchandanis.com/blog/uk-fs-regulation-ai-architecture-2026/"/>
    <id>https://tarun.bulchandanis.com/blog/uk-fs-regulation-ai-architecture-2026/</id>
    <updated>2026-07-30T00:00:00.000Z</updated>
    <published>2026-07-30T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>The UK financial services regulators have moved noticeably faster on AI in the last 18 months than the consensus expected. The FCA, PRA and Bank of England statements landed in a usable shape; the practical implications for the architecture function are concrete.</summary>
    <content type="html">&lt;p&gt;The UK financial services regulators have moved
noticeably faster on AI in the last 18 months than the
consensus expected. The FCA&#39;s Discussion Paper 5/22
posture has firmed up through the joint Bank of England /
PRA AI Discussion Paper, the SS1/23 model risk
management supervisory statement, and the FCA&#39;s
2025-26 AI strategy. Most of the EY, KPMG, Deloitte and
PwC commentary on this is calibrated to the advisor&#39;s
audience: the board, the executive committee, the chief
risk officer.&lt;/p&gt;
&lt;p&gt;The architecture function lives one layer further in.
The architects building the systems that will or will
not be compliant with this regulatory envelope need a
different read of the same material. This piece is for
that audience.&lt;/p&gt;
&lt;h2&gt;What the regulators have actually said&lt;/h2&gt;
&lt;p&gt;Three things have crystallised.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model risk management applies to AI.&lt;/strong&gt; SS1/23 confirms
that machine learning models, including generative AI
where used in regulated activities, fall within the
scope of model risk management. The implications are
specific: model inventory, model validation, model
performance monitoring and governance escalation all
apply, and the firm&#39;s three lines of defence have to
adjust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Outsourcing rules apply to AI vendors.&lt;/strong&gt; The FCA and
PRA are explicit that an AI service provider (whether
that is a foundation model API, a vertical AI tool, or
an embedded AI feature in a SaaS product) is an
operational outsource arrangement. The firm has to do
the same vendor due diligence, the same exit planning
and the same operational resilience analysis it does
for any other material outsource.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Senior management responsibility is named.&lt;/strong&gt; SMCR
already places accountability with named senior managers
for material risks. The regulator has been clear that AI
risk is a material risk; the accountability is
allocated, and the architecture function&#39;s design
choices are auditable against that accountability.&lt;/p&gt;
&lt;p&gt;These three together set the operating envelope for any
AI deployment in a regulated UK firm.&lt;/p&gt;
&lt;h2&gt;What the architecture function has to do&lt;/h2&gt;
&lt;p&gt;Six implications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Maintain a live model inventory.&lt;/strong&gt; Every AI system
in the firm has to be in a register. The register has
to include the model, the use case, the data sources,
the human-in-the-loop arrangements, the validation
status and the named senior manager accountable. This
isn&#39;t a one-off document; it is a continuously
maintained artefact, and the architecture function is
typically the owner.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Design for auditable decision trails.&lt;/strong&gt; Where an AI
system contributes to a regulated decision (customer
onboarding, credit, suitability, complaints handling,
trading), the trail of inputs, model outputs and human
override has to be auditable for the regulatory
retention period. This sits on top of conventional
logging and requires deliberate design.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Treat AI vendor selection as an outsource
decision.&lt;/strong&gt; Foundation model APIs are operational
outsources. The architecture function should be running
them through the firm&#39;s outsource framework rather than
the technology procurement framework. The two have
materially different gates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Build exit paths.&lt;/strong&gt; The outsource framework
requires demonstrable exit paths. For a foundation model
provider that means an alternative provider has to be
viable, the firm&#39;s prompts and data sets have to be
portable, and the operational continuity in a
provider-failure scenario has to be tested. Most firms
have not done this work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Plan for the EU AI Act overlap.&lt;/strong&gt; UK firms with EU
customers operate in two regulatory envelopes. The EU
AI Act&#39;s high-risk system requirements apply where the
firm&#39;s AI system is used to deliver services to EU
customers. The architecture function has to design for
the more demanding of the two regimes, not the easier
one. See &lt;a href=&quot;/blog/data-residency-ai-workloads-uk-eu/&quot;&gt;Data residency for AI workloads&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Get ahead of MCP and agent governance.&lt;/strong&gt; Agent
interoperability standards (MCP in particular) are
maturing faster than the regulatory commentary. A firm
that deploys agents without explicit governance over
which tools they can call, against which data, with
what authority, is exposed. The architecture function
should be the source of this governance, not the legal
function. See &lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;Auditing agent
decisions&lt;/a&gt; and &lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is
the most important enterprise standard nobody is
implementing&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;What this looks like in delivery&lt;/h2&gt;
&lt;p&gt;In practice, the firms doing this well share four
characteristics.&lt;/p&gt;
&lt;p&gt;A senior architect with explicit accountability for the
AI risk envelope. Not a chief AI officer; a chief
architect who treats AI risk as part of the broader
architecture remit.&lt;/p&gt;
&lt;p&gt;A model inventory that is updated as part of the change
process, not as a standalone exercise. The change
control workflow refuses to release a system that
changes the AI use case without an updated register
entry.&lt;/p&gt;
&lt;p&gt;An outsource gate that AI vendors actually pass through.
The technology team can recommend; the outsource
committee approves; the architecture function provides
the technical assessment.&lt;/p&gt;
&lt;p&gt;A regulatory radar wired into the architecture function.
When the FCA publishes a new portfolio letter or the
PRA issues a new supervisory statement, the architecture
function reads it, assesses it against the firm&#39;s
estate, and tables an impact paper to the relevant
committee.&lt;/p&gt;
&lt;h2&gt;Where this leaves the firm&lt;/h2&gt;
&lt;p&gt;The UK regulatory posture on AI is, on the whole,
proportionate. It is not designed to prevent firms
from deploying AI; it is designed to make them deploy
it carefully. The architecture function is the part of
the firm best placed to deliver that carefulness in
practice.&lt;/p&gt;
&lt;p&gt;For more on the broader operating model implications,
see also &lt;a href=&quot;/blog/reference-architecture-agentic-ai-regulated/&quot;&gt;A reference architecture for agentic AI in the
regulated
enterprise&lt;/a&gt;,
&lt;a href=&quot;/blog/ai-compliance-function-architects-view/&quot;&gt;How AI is reshaping the compliance function: an
architect&#39;s view&lt;/a&gt;,
and the existing pieces on &lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;auditing agent
decisions&lt;/a&gt; and &lt;a href=&quot;/blog/cursor-regulated-policy/&quot;&gt;cursor
in a regulated industry&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Event-driven architecture: when it adds value, and when it doesn&#39;t</title>
    <link href="https://tarun.bulchandanis.com/blog/event-driven-architecture-when-it-works/"/>
    <id>https://tarun.bulchandanis.com/blog/event-driven-architecture-when-it-works/</id>
    <updated>2026-07-23T00:00:00.000Z</updated>
    <published>2026-07-23T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Event-driven architecture has become a default recommendation in modern technical practice. The reality across enterprise contexts is more nuanced. A practical framework for assessing when an event-driven approach genuinely adds value, and when it introduces complexity that a simpler synchronous design would have avoided.</summary>
    <content type="html">&lt;h2&gt;Executive summary&lt;/h2&gt;
&lt;p&gt;Event-driven architecture has become, over the past decade, one of
the more confidently recommended patterns in modern technical
practice. The general advice — that systems should communicate
through events rather than direct synchronous calls — is now widely
adopted and substantially codified in the technical literature.&lt;/p&gt;
&lt;p&gt;The reality across enterprise contexts is more nuanced. Event-driven
architecture, applied with discipline in the right context, delivers
material benefits in decoupling, resilience, and scalability. Applied
without that discipline, or applied to contexts where it does not
fit, it introduces a category of complexity that organisations
underestimate at the outset and pay for over years.&lt;/p&gt;
&lt;p&gt;This piece sets out a framework for assessing when an event-driven
approach genuinely adds value, when a simpler synchronous design
would have been the right choice, and what the architectural
indicators are for each. It is not an argument against event-driven
architecture; it is an argument for selecting it deliberately
rather than adopting it as the default.&lt;/p&gt;
&lt;h2&gt;Where the pattern came from, and why it has been over-applied&lt;/h2&gt;
&lt;p&gt;Event-driven architecture, in its modern form, emerged from several
parallel developments. The growth of large-scale internet platforms
demonstrated the limits of tightly coupled synchronous architectures.
The emergence of mature message broker technology — Apache Kafka in
particular — made event streaming practical at enterprise scale. The
broader move to microservices created a category of inter-service
communication problems that event-driven patterns were well suited
to address. And the influential body of writing from companies that
had successfully scaled their architectures using event-driven
patterns established the credibility of the approach.&lt;/p&gt;
&lt;p&gt;The result, by the mid-2020s, was a strong default in favour of
event-driven architecture in new system design. This default has
produced both genuine benefits and, in my observation, a substantial
amount of over-application.&lt;/p&gt;
&lt;p&gt;The over-application is not surprising. The benefits of event-driven
architecture are visible in headline scenarios — the canonical case
studies from large-scale platforms — and the costs are diffused
across the operational lifecycle of the system. The right comparison
is rarely available at the design stage: a synchronous version of
the same system, running in the same context, with which the
event-driven version can be benchmarked. Without that comparison,
the recommendation to adopt the pattern looks costless. It is not.&lt;/p&gt;
&lt;h2&gt;The five contexts where event-driven architecture adds value&lt;/h2&gt;
&lt;p&gt;The pattern adds genuine value in specific contexts. Five of them
that recur across the enterprises I have observed.&lt;/p&gt;
&lt;h3&gt;1. Asynchronous, long-running business processes&lt;/h3&gt;
&lt;p&gt;When a business process has steps that are inherently long-running
— typically because they depend on external systems, on human
action, or on temporal triggers — modelling the process as an
event-driven workflow is materially cleaner than the synchronous
alternative.&lt;/p&gt;
&lt;p&gt;The architectural marker is a process where one or more steps may
take seconds to days, where the calling system has no reasonable
basis for blocking on the result, and where the eventual completion
needs to trigger downstream actions. A vendor onboarding workflow,
an insurance claim assessment, a customer credit check, a multi-stage
fulfillment process — each of these is a natural fit for an
event-driven design.&lt;/p&gt;
&lt;p&gt;The wrong alternative in this context is typically a polling
approach, where the calling system repeatedly checks the status of
the long-running process. Polling is workable at small scale and
becomes operationally fragile at larger scale. The event-driven
alternative is cleaner.&lt;/p&gt;
&lt;h3&gt;2. Multi-consumer data distribution&lt;/h3&gt;
&lt;p&gt;When the same data needs to be distributed to multiple downstream
systems, each of which consumes it for different purposes and on
different cadences, an event-driven approach offers significant
architectural advantages over the alternative.&lt;/p&gt;
&lt;p&gt;The architectural marker is a set of systems all dependent on a
common data source — typically a system of record, such as a
customer master, a product catalogue, an order book — where each
consuming system has its own data model, its own consumption
frequency, and its own latency tolerance.&lt;/p&gt;
&lt;p&gt;The pattern that emerges in this context is the publication of
domain events from the system of record onto a durable event log,
with each downstream consumer subscribing to the events relevant
to its purpose. The system of record does not need to know about
the consumers. The consumers do not need to coordinate with each
other. New consumers can be added without changes to the producer.
This is a class of decoupling that synchronous architectures
genuinely cannot match.&lt;/p&gt;
&lt;h3&gt;3. Audit, observability and replay requirements&lt;/h3&gt;
&lt;p&gt;In contexts where the system needs to maintain a complete,
replayable history of significant events — for audit, for analytical
purposes, for the ability to reconstruct system state at a prior
point in time — an event-driven architecture using an immutable
event log is a natural fit.&lt;/p&gt;
&lt;p&gt;This is particularly relevant in regulated industries, where the
audit story matters substantially. An event-sourced subsystem
provides a complete, append-only record of every change to its
state. The state itself is derivable from the event log at any
point. The audit requirement is satisfied as a property of the
architecture rather than as a separate logging concern.&lt;/p&gt;
&lt;p&gt;The marker for this context is a regulatory or operational
requirement to demonstrate the complete history of a particular
domain — financial transactions, regulatory submissions, clinical
decisions, vendor risk assessments. In each case, the cost of the
event-sourcing pattern is justified by the audit story it provides.&lt;/p&gt;
&lt;h3&gt;4. Cross-organisational integration with limited coordination&lt;/h3&gt;
&lt;p&gt;When systems owned by different organisations need to exchange data
or trigger actions in each other&#39;s domains — partner APIs,
multi-enterprise supply chains, regulator-to-firm submissions,
inter-bank transactions — event-driven patterns reduce the
coordination cost meaningfully.&lt;/p&gt;
&lt;p&gt;The synchronous alternative requires that each integration is
designed around the specifics of the partner system, with each
endpoint negotiated, each schema versioned bilaterally, and each
change managed through bilateral discussion. The event-driven
alternative, particularly using a published industry-standard
schema or a mediating event hub, allows each party to evolve
their internal systems with greater independence.&lt;/p&gt;
&lt;p&gt;This is a context where the gain is largely organisational rather
than technical. The technical complexity of event-driven integration
is non-trivial. The reduction in coordination overhead is what
justifies it.&lt;/p&gt;
&lt;h3&gt;5. Genuine scale and throughput requirements&lt;/h3&gt;
&lt;p&gt;When the volume of inter-system communication is such that the
synchronous alternative would impose unsustainable operational
demands — typically measured in events per second rather than in
business transactions per day — event-driven architecture is the
appropriate response.&lt;/p&gt;
&lt;p&gt;The architectural marker is a system whose throughput requirements
exceed what a synchronous design could comfortably sustain on the
available infrastructure. Telemetry pipelines, financial market
data, large-scale logistics tracking, IoT sensor data — each of
these is a context where the volume itself justifies the event-driven
design.&lt;/p&gt;
&lt;p&gt;In these contexts the choice is not really event-driven versus
synchronous; it is which event-driven design to adopt. The
synchronous alternative is not viable at the required throughput.&lt;/p&gt;
&lt;h2&gt;The three contexts where event-driven architecture is the wrong choice&lt;/h2&gt;
&lt;p&gt;Counterpart to the above. Three contexts where, in my observation,
the synchronous alternative would have been the better choice and
the event-driven design has caused problems that are still being
absorbed.&lt;/p&gt;
&lt;h3&gt;1. Simple request-response interactions&lt;/h3&gt;
&lt;p&gt;The most common over-application is the use of event-driven
patterns for what is, in essence, a simple request-response
interaction. The calling system needs the response of the downstream
system to proceed. The downstream system can respond synchronously
in a small number of milliseconds. The semantics of the interaction
are straightforward call-and-return.&lt;/p&gt;
&lt;p&gt;In this context, modelling the interaction as an event-driven
exchange introduces several costs without commensurate benefit.
The latency increases, because the request has to traverse the
event bus rather than a direct call. The error handling becomes
more complex, because the caller now has to handle the possibility
that the response never arrives. The operational dependency on the
event broker becomes a single point of failure that the simpler
synchronous design would not have introduced.&lt;/p&gt;
&lt;p&gt;The architectural marker for this anti-pattern is a system where
the calling code, in effect, has to wait for the response anyway —
either through correlation IDs and asynchronous waits, or through
explicit polling — and the event-driven nature has become a kind
of complication wrapped around a synchronous interaction.&lt;/p&gt;
&lt;p&gt;The recommendation in this context is to use synchronous calls and
to accept the coupling. The coupling is real but typically modest
in this kind of interaction, and the operational cost of the
event-driven alternative substantially exceeds the cost of the
direct dependency.&lt;/p&gt;
&lt;h3&gt;2. Transactional consistency requirements&lt;/h3&gt;
&lt;p&gt;When the business semantics of an interaction require that multiple
state changes either all happen or none of them happen — the
classic atomic transaction — event-driven architecture introduces
a category of complexity that organisations consistently underestimate.&lt;/p&gt;
&lt;p&gt;The synchronous alternative, particularly within a single database,
provides atomic transactions as a property of the underlying
system. Two updates within the same transaction either both commit
or both roll back. The application code does not need to model
the failure scenarios in detail; the database handles them.&lt;/p&gt;
&lt;p&gt;The event-driven alternative replaces this with a saga pattern, in
which the equivalent atomicity is achieved through a sequence of
events with compensating actions. Saga patterns are well-documented
and well-understood as a concept. They are also operationally
demanding to implement correctly, and the failure scenarios they
need to model are numerous.&lt;/p&gt;
&lt;p&gt;The architectural marker for this anti-pattern is a system where
the engineering team is spending substantial effort on the design,
testing, and operational handling of compensating actions to
maintain a property that a synchronous database transaction would
have provided as a baseline.&lt;/p&gt;
&lt;p&gt;The recommendation in this context is to keep the transactional
interaction within a single bounded context, served by a single
database with synchronous transactions, and to use event-driven
patterns only for the genuinely cross-context interactions where
the coupling cost would otherwise be high.&lt;/p&gt;
&lt;h3&gt;3. Small-team contexts with limited operational maturity&lt;/h3&gt;
&lt;p&gt;The third anti-pattern is less about the workload characteristics
and more about the organisational context. Event-driven architecture
requires a meaningful investment in the operational platform — the
event broker, the monitoring infrastructure, the schema registry,
the dead-letter queue handling, the replay tooling. The investment
is appropriate at sufficient scale and with sufficient organisational
maturity. It is disproportionate at smaller scale or in less mature
contexts.&lt;/p&gt;
&lt;p&gt;The architectural marker for this anti-pattern is a small engineering
team, often early in its operational maturity, that has adopted
event-driven patterns by default and is spending a meaningful
proportion of its time on platform issues rather than on the
business problem the systems are meant to solve.&lt;/p&gt;
&lt;p&gt;The recommendation in this context is to start with a synchronous
architecture, to maintain a clear set of internal interfaces along
domain boundaries, and to migrate to event-driven patterns only
when the specific need arises and the operational platform exists
to support it. Event-driven architecture is a destination some
systems should reach. It is rarely the right starting point.&lt;/p&gt;
&lt;h2&gt;A practical assessment framework&lt;/h2&gt;
&lt;p&gt;For architecture leaders evaluating whether to adopt an event-driven
approach for a specific system or system boundary, a small set of
questions can structure the decision.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;If &amp;quot;yes&amp;quot;&lt;/th&gt;
&lt;th&gt;If &amp;quot;no&amp;quot;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Is the interaction inherently asynchronous, long-running, or temporally decoupled?&lt;/td&gt;
&lt;td&gt;Event-driven design likely justified.&lt;/td&gt;
&lt;td&gt;Synchronous likely simpler.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Are there multiple downstream consumers of the same data?&lt;/td&gt;
&lt;td&gt;Event-driven enables decoupling benefit.&lt;/td&gt;
&lt;td&gt;Synchronous typically adequate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is there a regulatory or audit requirement for a complete event history?&lt;/td&gt;
&lt;td&gt;Event-sourcing pattern likely valuable.&lt;/td&gt;
&lt;td&gt;Standard logging typically sufficient.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does the throughput exceed what synchronous infrastructure can comfortably support?&lt;/td&gt;
&lt;td&gt;Event-driven is required, not optional.&lt;/td&gt;
&lt;td&gt;Synchronous remains viable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Do you have the operational maturity to run an event-driven platform reliably?&lt;/td&gt;
&lt;td&gt;Proceed if other answers warrant.&lt;/td&gt;
&lt;td&gt;Defer event-driven adoption until the platform investment is justified.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Does the interaction require transactional consistency across the systems involved?&lt;/td&gt;
&lt;td&gt;Be cautious of saga complexity.&lt;/td&gt;
&lt;td&gt;Event-driven is likely a clean fit.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A system that answers &amp;quot;yes&amp;quot; to multiple of the first four questions
and has the operational maturity for the fifth is likely a good
fit for an event-driven design. A system that answers &amp;quot;yes&amp;quot; to
the sixth and &amp;quot;no&amp;quot; to the others is likely better served by a
synchronous approach.&lt;/p&gt;
&lt;p&gt;The framework is not a decision tree in any rigorous sense. It is
a structured way to surface the considerations that, in practice,
are too often skipped at the design stage.&lt;/p&gt;
&lt;h2&gt;Implications for architecture leaders&lt;/h2&gt;
&lt;p&gt;Three broader implications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The architecture function should resist the default toward
event-driven patterns.&lt;/strong&gt; The general technical literature, the
vendor narratives, and the broader practitioner conversation all
tend toward recommending event-driven approaches. The architecture
function&#39;s role is to apply judgement to that recommendation in
the specific context of the organisation and the specific workload.
This is not a fashionable position, but it is, in my observation,
the position that produces better outcomes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The operational maturity question is more important than the
technical-fit question.&lt;/strong&gt; The technical fit for event-driven
architecture is generally easier to assess than the operational
maturity required to run it. The technical fit determines whether
the pattern can produce value. The operational maturity determines
whether it will. Organisations that adopt event-driven patterns
before establishing the operational platform consistently
underestimate the cost.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The pattern should be revisited at major architectural milestones.&lt;/strong&gt;
Systems evolve. A system that did not warrant event-driven design
at inception may warrant it as it grows. A system that was designed
event-driven may be carrying complexity that is no longer
justified. The architecture function should treat the choice as
reversible at major milestones — at significant scale changes, at
major reorganisations, at the end of programme phases — and should
revisit it rather than treating it as a once-and-done decision.&lt;/p&gt;
&lt;h2&gt;Closing&lt;/h2&gt;
&lt;p&gt;Event-driven architecture is a powerful tool when applied to
the right problem. It is also one of the patterns most prone to
over-application in current practice, and the operational cost
of that over-application accumulates across the lifecycle of the
systems it affects.&lt;/p&gt;
&lt;p&gt;For architecture leaders, the recommendation is to retain the
pattern in the toolkit, to apply it where the context warrants,
and to maintain the discipline to choose a simpler synchronous
design where it does not. The choice between event-driven and
synchronous is not a question of which is better in the abstract.
It is a question of which is appropriate to the specific system,
the specific workload, and the specific organisation. The framework
above is one way of structuring that question.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Lessons from large-scale ERP transformation: an architect&#39;s perspective</title>
    <link href="https://tarun.bulchandanis.com/blog/large-scale-erp-transformation-lessons/"/>
    <id>https://tarun.bulchandanis.com/blog/large-scale-erp-transformation-lessons/</id>
    <updated>2026-07-16T00:00:00.000Z</updated>
    <published>2026-07-16T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Mega-scale ERP transformation programmes have a consistent set of architectural challenges that do not appear in the vendor narrative or the typical analyst commentary. Five lessons from leading the architecture for a multi-region S/4HANA programme of nine-figure scale, drawn from the experience of running the programme rather than from the post-completion case study.</summary>
    <content type="html">&lt;h2&gt;Executive summary&lt;/h2&gt;
&lt;p&gt;Large-scale ERP transformation — the multi-year, multi-region,
nine-figure programmes that migrate a global organisation to a
modern ERP platform — remains one of the most complex undertakings
an enterprise architecture function can lead. The published
commentary on these programmes tends to fall into two categories:
vendor narratives, which emphasise the destination at the expense
of the journey, and consultancy retrospectives, which emphasise
the methodology at the expense of the architectural detail.&lt;/p&gt;
&lt;p&gt;What is rarely discussed in either category is what the architecture
function actually grapples with during the programme — the specific
decisions, the moments where the architectural posture matters most,
and the lessons that translate across organisations and across
ERP product lines. This piece is an attempt to fill that gap, drawn
from leading the global architecture for an S/4HANA transformation
of nine-figure scale across multiple regions.&lt;/p&gt;
&lt;p&gt;Five lessons. None of them are surprises individually. The combination
is what produces the difference between a programme that delivers
the architectural foundation the organisation will use for fifteen
years and a programme that delivers a system that runs but does
not provide that foundation.&lt;/p&gt;
&lt;h2&gt;Context&lt;/h2&gt;
&lt;p&gt;For the avoidance of doubt: the lessons that follow are drawn
from a programme that ran over multiple years, across more than
fifteen country deployments, with a total programme value in the
hundreds of millions, on SAP&#39;s S/4HANA platform. The specifics of
the organisation are not relevant to the lessons; the lessons
themselves are. Where the lesson is specific to SAP or to S/4HANA,
this is called out. Where it is more general, it generalises to
other large-scale ERP platforms (Oracle Cloud ERP, Workday for
finance, Microsoft Dynamics 365 for finance and operations) and
to the broader class of multi-year platform programmes.&lt;/p&gt;
&lt;h2&gt;Lesson 1: The architectural foundations laid in the first six months determine the next ten years&lt;/h2&gt;
&lt;p&gt;The single most consequential set of decisions in a large-scale
ERP programme is the foundational architectural choices made
during the design and blueprinting phase, typically in the first
six months of the programme.&lt;/p&gt;
&lt;p&gt;These decisions are well-rehearsed at the level of headline choices
— template versus country-by-country design, single instance versus
multiple instances, public versus private cloud, on-premise versus
hosted, the depth of the deployment model — and the programme
governance generally treats them as the major decision points they
are. What is less well-recognised is the long tail of architectural
decisions that follow from each headline choice, each of which
constrains the next decade of evolution.&lt;/p&gt;
&lt;p&gt;The decision to deploy a global template, for instance, carries
with it implicit choices about the depth of the master data
hierarchy, the granularity of organisational units, the design of
the chart of accounts, the structure of material masters and
customer masters, and dozens of similar foundational data structures.
Each of these choices is technically reversible after go-live but
is, in practice, prohibitively expensive to change. The combined
set establishes the shape of the architectural ground on which the
organisation will build for the next decade.&lt;/p&gt;
&lt;p&gt;The lesson is that the architectural rigour applied to these
foundational decisions needs to be substantially greater than the
rigour typically applied. Not in the sense of more meetings or
heavier documentation, but in the sense of explicit consideration
of the long-term consequences of each decision, articulated in a
form that is understandable to non-architects and that survives
the inevitable personnel changes through the programme&#39;s lifetime.&lt;/p&gt;
&lt;p&gt;For architecture leaders entering a programme at this scale, the
recommendation is to establish, before the design phase begins, a
small set of architectural principles that the foundational
decisions will be tested against. These principles should be
written down, agreed at executive level, and revisited explicitly
at each foundational decision point. Principles that are obvious
in the abstract become decisive when applied to specific design
choices under time pressure.&lt;/p&gt;
&lt;h2&gt;Lesson 2: The integration architecture is more important than the ERP itself&lt;/h2&gt;
&lt;p&gt;A consistent observation across large-scale ERP programmes is the
disproportionate amount of value, and the disproportionate amount
of risk, that sits in the integration architecture rather than in
the ERP itself.&lt;/p&gt;
&lt;p&gt;The ERP is a packaged product. Its capabilities are largely defined
by the vendor&#39;s design. The organisation&#39;s design freedom is in
how it configures the product, not in what the product does. The
integration architecture, by contrast, is bespoke. It connects the
ERP to dozens, sometimes hundreds, of surrounding systems — the
CRM, the e-commerce platform, the manufacturing execution systems,
the customer-facing portals, the analytics platform, the regulatory
reporting systems, the data warehouses. The shape of these
integrations is the organisation&#39;s choice and the organisation&#39;s
responsibility.&lt;/p&gt;
&lt;p&gt;In programmes that go well, the integration architecture is treated
as a first-class deliverable, with named ownership, formal
governance, explicit standards, and rigorous testing. In programmes
that go poorly, the integration architecture is treated as a series
of necessary plumbing exercises, delegated to the systems integrator,
and discovered to be the source of operational issues only after
go-live.&lt;/p&gt;
&lt;p&gt;The lesson, in practical terms, is to invest disproportionately in
the integration architecture during the design phase, to maintain
a single integration architect of sufficient seniority across the
full programme lifecycle, and to ensure that the integration design
decisions are surfaced to the same governance forum as the ERP
design decisions. The integration architecture is not an
implementation detail.&lt;/p&gt;
&lt;p&gt;For organisations on S/4HANA specifically, this lesson applies with
particular force because the move to S/4HANA from a prior ECC
environment typically requires meaningful redesign of the existing
integration pattern. The HANA-native database structures, the
adoption of CDS views as the data access pattern, the BTP-based
integration approach, and the broader move away from older
middleware patterns all combine to make the integration architecture
genuinely new rather than incrementally updated.&lt;/p&gt;
&lt;h2&gt;Lesson 3: The data is harder than the process&lt;/h2&gt;
&lt;p&gt;In every large-scale ERP programme I have observed or been part of,
the data migration and the master data design have absorbed
significantly more effort, surfaced significantly more issues, and
created significantly more delay than the process design.&lt;/p&gt;
&lt;p&gt;This is counter-intuitive at the outset. The headline framing of an
ERP programme is about business processes — order-to-cash,
procure-to-pay, record-to-report, hire-to-retire. The implication
is that the work is process design and configuration. In practice,
once the process design is settled, configuration is a comparatively
mechanical exercise. The difficult work is the data.&lt;/p&gt;
&lt;p&gt;The difficulty arises from several sources. Historical data quality
is almost always worse than the legacy systems&#39; apparent state
suggests. The reconciliation of master data across multiple legacy
systems, often each with its own slightly different version of the
same customer, product or supplier, is intricate and politically
fraught. The design of the new master data hierarchies is a place
where the organisational politics of how the business is structured
become visible, and where the architectural decision is constrained
by the operating model decision. And the migration itself, when
it finally happens, requires sustained attention to quality at a
level that few programmes plan for adequately.&lt;/p&gt;
&lt;p&gt;The lesson is that data should be treated as a first-class workstream
from the beginning of the programme, with its own architect, its
own governance, and its own quality measurement. The data workstream
is not a sub-task of the process workstream. The relative weighting
of effort, in a well-run programme, is closer to a 60-40 split
between data and process than the 20-80 split that the early
programme planning typically assumes.&lt;/p&gt;
&lt;p&gt;For SAP S/4HANA programmes specifically, this lesson is compounded
by the discipline that S/4HANA&#39;s data model imposes — the move from
the more permissive ECC data model to the stricter S/4HANA structures
means that data quality issues which were tolerable in the legacy
estate become blocking issues in the new platform. Programmes that
underestimate this typically discover it during the first cutover,
which is the worst possible moment to discover it.&lt;/p&gt;
&lt;h2&gt;Lesson 4: The governance model has to be designed for years, not for the programme&lt;/h2&gt;
&lt;p&gt;Large-scale ERP programmes are typically governed through a programme
structure — a steering committee, a programme board, workstream
leads, design authorities — that is appropriate for the programme&#39;s
duration but is rarely fit for the post-programme operating model.
When the programme concludes, the governance structure dissolves,
and the architecture function is left with an inadequately designed
ongoing model.&lt;/p&gt;
&lt;p&gt;This is the source of a familiar pattern: the platform is delivered
to a high standard, the organisation goes live successfully, and
within eighteen months the platform begins to accumulate decisions
that do not fit the original architectural intent because the
governance forum that would have prevented them no longer exists
in its programme form.&lt;/p&gt;
&lt;p&gt;The lesson is that the programme governance model must be designed
with the post-programme operating model in mind. The Design Authority
that governs decisions during the programme should evolve into the
Architecture Governance Board that governs decisions after the
programme. The standards and patterns established during the
programme should be documented in a form that survives the programme&#39;s
conclusion and is maintained by a named function.&lt;/p&gt;
&lt;p&gt;In a programme I led, this transition was planned eighteen months
before go-live, with the post-programme governance structure
explicitly designed and the transition of named roles into the
post-programme model documented as part of the programme closure.
The pattern proved durable. The platform&#39;s architectural integrity
was preserved through the first three years of operation under the
governance model that the programme had established. This is not
the typical outcome.&lt;/p&gt;
&lt;p&gt;The recommendation for architecture leaders is to make the
post-programme operating model an explicit programme deliverable,
to design it with the same rigour as any other architectural design,
and to ensure that the transition is treated as a critical milestone
in the programme plan, not as a clean-up activity after go-live.&lt;/p&gt;
&lt;h2&gt;Lesson 5: The change management story is an architectural concern&lt;/h2&gt;
&lt;p&gt;The framing of change management as a separate workstream from
architecture is, in my view, increasingly unhelpful. The two are
deeply intertwined and the architectural decisions made during the
programme have direct implications for the change management
challenge.&lt;/p&gt;
&lt;p&gt;A platform that adopts the vendor&#39;s standard process where the
organisation&#39;s current process is materially different will require
a much larger change management effort than a platform that has
been configured to accommodate the existing process. The architectural
choice between standardising on the vendor&#39;s process and customising
to the existing process is, in part, a change management choice
disguised as an architectural one.&lt;/p&gt;
&lt;p&gt;The lesson is that the architecture function should be involved in
the change management strategy, not as a peripheral input but as a
central voice. The architectural choices about template adherence,
configuration depth, the degree of process harmonisation, and the
phasing of country deployments each have direct implications for
the magnitude of the change management challenge and the likelihood
of successful adoption.&lt;/p&gt;
&lt;p&gt;A practical pattern that works well is the establishment of a joint
architecture-and-change forum, meeting on a defined cadence during
the programme, where the architectural decisions are tested against
their change management implications and vice versa. This forum
serves as a check on the natural tendency of the architecture
function to favour cleaner technical designs at the expense of
adoption, and on the natural tendency of the change management
function to favour minimal disruption at the expense of long-term
technical health.&lt;/p&gt;
&lt;h2&gt;Implications for transformation leaders&lt;/h2&gt;
&lt;p&gt;Three broader implications for executives sponsoring or leading
transformation programmes of this scale.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The architecture function&#39;s role is not a supporting function in
these programmes.&lt;/strong&gt; It is the function that determines whether the
investment produces the foundation the organisation will operate
on for the next decade. The seniority, the authority, and the
durability of the architecture function across the programme
lifecycle are not implementation details; they are programme
success factors.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The selection of the systems integrator should be informed by
their architecture leadership, not by their workforce capacity.&lt;/strong&gt;
The systems integrator&#39;s commercial proposition is typically
framed around capacity — the number of consultants, the day rates,
the project management methodology. The actual differentiator
across systems integrators in large-scale ERP work is the calibre
of their lead architects and the depth of their architectural
practice. This should be weighted heavily in the selection process.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The programme should expect to revisit foundational decisions at
defined points.&lt;/strong&gt; Some of the foundational decisions made in the
first six months will look incorrect with the benefit of two years
of programme experience. The governance model should include
defined review points at which the foundational decisions are
re-examined and, where appropriate, formally revised before the
cost of the original decision compounds further. This is
counter-cultural in programmes that are pressured to maintain
forward momentum, but the cost of revisiting a foundational
decision at month eighteen is, in almost every case, smaller than
the cost of carrying it through to go-live and beyond.&lt;/p&gt;
&lt;h2&gt;Closing&lt;/h2&gt;
&lt;p&gt;Large-scale ERP transformation remains, in 2026, one of the most
demanding undertakings an enterprise will take on. The vendor
products have matured; the implementation methodologies have
become more disciplined; the cloud-based delivery patterns have
removed some of the historical friction. The fundamental challenges
— the foundational decisions, the integration architecture, the
data, the governance model, the change management — remain.&lt;/p&gt;
&lt;p&gt;The lessons above are not the only ones I would draw from leading
work at this scale. They are the ones that I have not seen written
about with the directness that, in my view, they merit. For
architecture leaders entering or running programmes of this kind,
I hope they are useful. The work is hard and the published
guidance is, in places, less honest about the difficulty than
the work itself deserves.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Identity-first security: rethinking the enterprise perimeter in 2026</title>
    <link href="https://tarun.bulchandanis.com/blog/identity-first-security-perimeter/"/>
    <id>https://tarun.bulchandanis.com/blog/identity-first-security-perimeter/</id>
    <updated>2026-07-09T00:00:00.000Z</updated>
    <published>2026-07-09T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>The traditional network perimeter has been eroding for a decade. In 2026 it is, in most regulated enterprises, no longer the primary control. Identity has assumed that role, with significant implications for application design, vendor selection and incident response. A perspective for architecture leaders.</summary>
    <content type="html">&lt;h2&gt;Executive summary&lt;/h2&gt;
&lt;p&gt;The shift from network-perimeter security to identity-centric
security has been underway for the better part of a decade. The
trend is not new. What is new in 2026 is the degree to which the
shift is now structurally complete in most regulated enterprises,
the architectural implications that flow from this, and the
specific patterns that distinguish organisations that have made
the transition well from those still in the middle of it.&lt;/p&gt;
&lt;p&gt;This piece sets out the current state of identity-first security
architecture, the five implications that architecture leaders
should be working through, and a framework for assessing the
maturity of an organisation&#39;s identity posture.&lt;/p&gt;
&lt;p&gt;It is not a treatment of identity technology selection. The
question of which identity provider to deploy is well covered
elsewhere, and the answer in 2026 is for most organisations a
choice among a small number of well-established providers (Microsoft
Entra ID, Okta, Ping Identity, the AWS-native option for cloud-first
shops, and a small number of open-source alternatives for specific
contexts). The interesting questions are about how the architecture
function exercises the identity-first posture, not about which
product enables it.&lt;/p&gt;
&lt;h2&gt;The structural shift&lt;/h2&gt;
&lt;p&gt;The network perimeter — the firewall around the data centre, the
VPN as the controlled access path, the implicit trust granted to
traffic that originated inside the corporate network — was the
dominant security control for forty years. It is no longer.&lt;/p&gt;
&lt;p&gt;Several developments have combined to produce this outcome. The
adoption of public cloud means a meaningful proportion of the
estate is, by definition, outside any network perimeter the
organisation controls. The widespread move to SaaS for business
applications removes those systems from the perimeter entirely.
The structural shift to hybrid working has eliminated the
assumption that &amp;quot;user inside the network&amp;quot; is a meaningful concept.
The growing use of third-party integrations, partner APIs, and
managed services has multiplied the number of legitimate connections
that cross what used to be the perimeter. And the consistent
record of perimeter breaches — where attackers reach the network
interior and then move laterally with the implicit trust the
network model granted — has made the perimeter-trust model
demonstrably unsafe.&lt;/p&gt;
&lt;p&gt;The replacement, broadly described as zero trust or as identity-first
security, treats every access decision as an explicit authorisation
event. The trust granted to any given request is calculated based
on the identity of the requesting principal, the device that
principal is using, the context of the request, and the sensitivity
of the resource being accessed. The network from which the request
originates is, at most, one factor among several, and is treated
as untrusted by default.&lt;/p&gt;
&lt;p&gt;This is the model that the major enterprise security frameworks
have now substantively adopted — the NIST zero trust architecture
guidance, the UK National Cyber Security Centre&#39;s design principles,
the various sector-specific overlays from the FCA, the PRA, and
equivalent bodies in other jurisdictions. Identity, in this model,
is the primary control. The other controls are supporting.&lt;/p&gt;
&lt;h2&gt;Five implications for the architecture function&lt;/h2&gt;
&lt;p&gt;The implications for the architecture function are substantial.
Five that I would recommend any architecture leader work through
explicitly.&lt;/p&gt;
&lt;h3&gt;1. The integration of the enterprise identity provider becomes a non-negotiable&lt;/h3&gt;
&lt;p&gt;A consequence of the identity-first posture is that every production
system needs to integrate with the enterprise identity provider.
Local user accounts, shared service accounts, and the various
forms of out-of-band authentication that have accumulated in most
enterprises over the years become, in this model, exceptions that
require explicit justification.&lt;/p&gt;
&lt;p&gt;The architectural implication is that the question &amp;quot;does this
system integrate with our identity provider&amp;quot; moves from a desirable
property to a hard prerequisite. New application onboarding,
vendor selection, and acquisition integration each need to apply
this filter. Applications that do not support modern federation
protocols — OpenID Connect, SAML 2.0, SCIM for provisioning — are
increasingly difficult to justify in a regulated environment.&lt;/p&gt;
&lt;p&gt;For the existing estate, the architecture function should expect
to spend a non-trivial proportion of its modernisation effort on
identity integration retrofits. This is unglamorous work that
rarely produces a visible new business capability. It is, however,
the work that materially reduces the organisation&#39;s security
exposure, and as such belongs near the top of the modernisation
backlog.&lt;/p&gt;
&lt;h3&gt;2. Service-to-service authentication needs the same discipline&lt;/h3&gt;
&lt;p&gt;The identity-first posture is sometimes applied conscientiously to
human users while being overlooked for service-to-service traffic.
This is a meaningful gap. The lateral movement that produces
serious breaches typically does not involve human user accounts;
it involves compromised service credentials, over-privileged
service accounts, and the various forms of implicit trust between
internal systems.&lt;/p&gt;
&lt;p&gt;The architectural response is to treat service identity with the
same rigour as user identity. Specifically: every service has a
named identity, ideally backed by an OAuth 2.1 client credentials
flow or equivalent. Service credentials are short-lived (ideally
minutes, not days). Permissions granted to a service are scoped to
the specific resources it requires, not to its containing system.
Service authentication is logged at the same level of detail as
user authentication. And shared service accounts — the credential
that several systems use to authenticate to a database, for
instance — are eliminated.&lt;/p&gt;
&lt;p&gt;This is non-trivial work, particularly in legacy estates. It is
also the work that closes the largest single category of
security gap I encounter in real production environments.&lt;/p&gt;
&lt;h3&gt;3. Permission models need to be explicit and reviewable&lt;/h3&gt;
&lt;p&gt;The identity-first model requires that the permissions granted to
each principal be explicit, scoped, and auditable. In practice,
this means moving away from broad role-based models toward more
fine-grained access patterns: attribute-based access control where
appropriate, just-in-time privilege elevation for high-risk
operations, time-bounded permissions for project-specific access,
and so on.&lt;/p&gt;
&lt;p&gt;The architectural decision is not which permission model to adopt
in the abstract but how to structure the organisation&#39;s permission
data such that the access decisions can be made, audited, and
revoked at the granularity the regulatory environment requires.
This is, increasingly, a data-architecture question as well as a
security-architecture question, and one where the two functions
need to be working in close partnership.&lt;/p&gt;
&lt;p&gt;A practical pattern emerging in the better-organised functions I
have observed is the codification of permission policies as
declarative artefacts in source control, applied uniformly across
the identity provider, the cloud platforms, and the application
layer. This pattern — sometimes described as &amp;quot;policy as code&amp;quot; —
brings the version control and review discipline that the rest of
the engineering organisation already applies to its software, to
the permission data that ultimately controls who can do what.&lt;/p&gt;
&lt;h3&gt;4. The audit story is materially different&lt;/h3&gt;
&lt;p&gt;In a network-perimeter model, audit was largely a function of
network logs — what traffic crossed which boundary, when. In an
identity-first model, audit is a function of authorisation events
— who attempted to access what, was the request authorised, what
context informed the decision, and what action followed.&lt;/p&gt;
&lt;p&gt;The implication is that the organisation&#39;s logging and event
infrastructure needs to capture authorisation events at the level
of detail required for regulatory audit. This includes, at minimum,
the identity of the principal making the request, the resource
being accessed, the permissions evaluated, the decision reached,
and the context that informed the decision (including the device
context, the network context, and any risk signals).&lt;/p&gt;
&lt;p&gt;For organisations that have been operating under regulatory regimes
with strong audit requirements — financial services, healthcare,
defence — this is largely a continuation of an existing discipline,
albeit one that needs to be extended to cover the broader set of
identity events that the identity-first posture surfaces. For
organisations newer to this level of audit detail, the implication
is a meaningful investment in logging infrastructure, retention
policy, and the analytics layer that turns authorisation events
into actionable intelligence.&lt;/p&gt;
&lt;p&gt;A related point that often goes underdiscussed: the audit trail
generated by an identity-first posture is itself valuable as an
input to the architecture function&#39;s own measurement practice.
Patterns in authorisation events — which resources are accessed
most frequently, which permissions are exercised most often, which
access requests are denied — provide useful signal about the actual
versus intended use of the estate.&lt;/p&gt;
&lt;h3&gt;5. The vendor selection criteria change&lt;/h3&gt;
&lt;p&gt;The criteria the architecture function applies in evaluating new
vendors need to incorporate the identity-first posture explicitly.
Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Support for the organisation&#39;s federation protocols, in the
specific versions and configurations the organisation requires.&lt;/li&gt;
&lt;li&gt;Support for SCIM-based provisioning, with the data attributes
the organisation maintains in its identity provider.&lt;/li&gt;
&lt;li&gt;Granular permission models exposed through the vendor&#39;s API,
rather than coarse role-based ones.&lt;/li&gt;
&lt;li&gt;The vendor&#39;s own internal identity discipline — how the vendor
authenticates the customer&#39;s data when its staff access it, and
what audit trail is provided for that access.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These criteria are, increasingly, table stakes for vendors selling
into regulated enterprises. The architecture function&#39;s role is
to ensure that the evaluation process applies them rigorously and
that vendor exceptions are not granted on commercial grounds
alone.&lt;/p&gt;
&lt;h2&gt;A working maturity framework&lt;/h2&gt;
&lt;p&gt;For architecture leaders looking to assess where their organisation
sits on the journey, a four-stage maturity framework is useful.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Perimeter-centric (legacy)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Most production systems still depend on the network perimeter as the primary control. Local user accounts are common. Service authentication is via shared accounts or unscoped credentials. Lateral movement after a perimeter breach would be straightforward.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Federated (transitional)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise identity provider integrated with the majority of production systems. Single sign-on widely available. Some service-to-service authentication still relies on legacy credentials. Some legacy systems remain on local accounts as exceptions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identity-first (target)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every production system integrated with the enterprise identity provider. Service-to-service authentication uses short-lived, scoped credentials. Permission policies are version-controlled. Audit trail captures authorisation events at the required level of detail.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identity-aware (mature)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All the above, plus dynamic risk-based access decisions, just-in-time privilege elevation, and continuous monitoring of authorisation patterns. The identity layer itself is a primary source of security intelligence.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In my observation, most regulated enterprises sit in the
&amp;quot;transitional&amp;quot; stage, with the more mature security organisations
either at &amp;quot;target&amp;quot; or actively working through the gap to it. The
move from &amp;quot;target&amp;quot; to &amp;quot;mature&amp;quot; is meaningful additional investment
and is appropriate for organisations with elevated threat profiles.&lt;/p&gt;
&lt;p&gt;For most organisations, the right ambition is reaching the
&amp;quot;identity-first&amp;quot; stage and operating sustainably at it. The further
maturity stage is a refinement, not a transformation.&lt;/p&gt;
&lt;h2&gt;Implications for the architecture function&lt;/h2&gt;
&lt;p&gt;Three broader implications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The architecture function&#39;s role in security has expanded.&lt;/strong&gt; The
identity-first posture is fundamentally an architectural posture,
not a tooling decision. The architecture function carries
substantial responsibility for ensuring the posture is achievable
across the estate, that the necessary integration patterns are
documented and adopted, and that the exceptions to the posture
are explicitly approved rather than silently tolerated. This is
a meaningful expansion of the function&#39;s remit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The boundary between architecture and security is becoming
less useful as an organisational construct.&lt;/strong&gt; In several
organisations I have observed working through this shift, the
historical separation between the architecture function and the
security function has become a source of friction. The security
function holds the policy authority. The architecture function
holds the implementation authority across the estate. The work
of bringing the estate into alignment with the identity-first
posture requires close, sustained partnership between the two
functions.&lt;/p&gt;
&lt;p&gt;A pragmatic operating model that has emerged in better-organised
enterprises is the establishment of a joint security architecture
practice — a small standing forum, with named representatives from
both functions, that takes joint ownership of the policy-to-pattern
translation and of the implementation governance across the estate.
This is not a structural reorganisation; it is a working model that
respects the two functions&#39; distinct authorities while ensuring
they operate in step.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The investment case is different from the historical security
investment case.&lt;/strong&gt; Identity-first security does not have the
visible, dramatic justification that perimeter security had — the
&amp;quot;we built a wall, attackers were stopped at the wall&amp;quot; narrative.
Its value is in the absence of harm, in the reduced blast radius
of compromises that do occur, and in the cleaner regulatory
posture that follows. This is a harder case to make to a finance
function looking for clear quantifiable benefits.&lt;/p&gt;
&lt;p&gt;The recommendation for architecture leaders working through this
case is to anchor it in specific compliance and audit outcomes
where they exist, in the demonstrable reduction in privileged
access exposure as a measurable input to the regulatory reporting
cycle, and in the operational simplification that comes from
consolidating to a single identity provider. The case can be
made; it requires more careful articulation than the historical
security investment case did.&lt;/p&gt;
&lt;h2&gt;Closing&lt;/h2&gt;
&lt;p&gt;The identity-first posture is not a project. It is a sustained
shift in how the enterprise treats access control, with implications
across application design, vendor selection, audit, and the
organisational relationship between architecture and security. The
organisations that have made the transition cleanly are quietly
better placed for the regulatory environment of the next decade
than those that have not.&lt;/p&gt;
&lt;p&gt;For architecture leaders who have not yet made an explicit
position on the maturity stage they are operating at and the one
they intend to reach, that conversation is overdue. The work is
substantial; the path is well-trodden; the alternative is to
remain in a security model that the rest of the industry has
moved past.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Architectural fitness functions: a practical framework for measuring enterprise architecture health</title>
    <link href="https://tarun.bulchandanis.com/blog/architectural-fitness-functions-framework/"/>
    <id>https://tarun.bulchandanis.com/blog/architectural-fitness-functions-framework/</id>
    <updated>2026-07-02T00:00:00.000Z</updated>
    <published>2026-07-02T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Enterprise architecture has long been criticised for the difficulty of measuring its impact. Architectural fitness functions offer a structured approach to this challenge. A working framework, with examples across six categories, for leaders looking to bring measurement discipline to their architecture practice.</summary>
    <content type="html">&lt;h2&gt;Executive summary&lt;/h2&gt;
&lt;p&gt;The persistent challenge facing enterprise architecture functions
is not the absence of strategy or the absence of documentation, but
the difficulty of demonstrating measurable impact. A capability
model is not a metric. A target operating model is not a measurement.
A roadmap is a plan, not an outcome.&lt;/p&gt;
&lt;p&gt;Architectural fitness functions — a concept introduced by Neal Ford,
Rebecca Parsons and Patrick Kua in their work on evolutionary
architecture — provide a structured response to this challenge. A
fitness function is a measurable, ideally automated indicator of
whether a given architectural property is being preserved or
improved over time. The concept is not new; the practice, in most
organisations I have observed, remains immature.&lt;/p&gt;
&lt;p&gt;This piece sets out a practical framework, organised across six
categories, that an architecture leader can adopt to bring
measurement discipline to their function. Each category includes
specific example metrics, the data sources required to compute
them, and a brief commentary on common pitfalls.&lt;/p&gt;
&lt;h2&gt;The case for fitness functions&lt;/h2&gt;
&lt;p&gt;The traditional measurements applied to architecture functions —
project delivery times, system uptime, vendor consolidation
savings — have several shortcomings. They are outcome measures of
the broader IT function rather than of the architectural choices
specifically. They are often lagging indicators by some margin.
And they tend to reward stability over improvement, which is the
inverse of what an architecture function ought to be incentivised
to deliver.&lt;/p&gt;
&lt;p&gt;Fitness functions, by contrast, are designed to be leading
indicators of architectural health. They measure properties that
the architecture function has direct influence over, on a cadence
short enough to drive corrective action, and in a form that can be
discussed productively with non-architects.&lt;/p&gt;
&lt;p&gt;The implementation requirement is modest. Most fitness functions
can be computed from data the organisation already produces: source
control activity, deployment logs, system metadata, cost dashboards,
and security scans. The marginal cost of producing the measurements
is small. The marginal benefit, particularly when the measurements
are tracked over time and shared with the broader leadership team,
is substantial.&lt;/p&gt;
&lt;p&gt;What follows is a framework of six categories, each with three or
four illustrative fitness functions. The framework is intended as
a starting point; organisations should adapt and extend it to their
specific context.&lt;/p&gt;
&lt;h2&gt;Category 1: Architectural alignment&lt;/h2&gt;
&lt;p&gt;Measurements of the degree to which the actual estate corresponds
to the stated architecture.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fitness function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Data source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Standards conformance rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of applications or services in the portfolio that conform to the published architectural standards (cloud-only deployment, container-based runtime, mandatory observability instrumentation, and similar).&lt;/td&gt;
&lt;td&gt;Application portfolio metadata; CMDB; infrastructure tagging.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capability coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of the published capability model that has at least one named owning application or service.&lt;/td&gt;
&lt;td&gt;Capability model; portfolio mapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reference architecture adoption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of new applications shipped in a given quarter that follow the published reference architecture, weighted by complexity.&lt;/td&gt;
&lt;td&gt;Architecture review records; deployment metadata.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Exception backlog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The number of approved architectural exceptions currently active, and the median age of an open exception.&lt;/td&gt;
&lt;td&gt;Exception register; AGB records.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The objective in this category is not to drive every metric to
100%. Some level of deviation from the standard is healthy — it
reflects the architecture function&#39;s response to genuine business
need rather than rigid enforcement. The objective is to surface
the trend. A standards conformance rate that has been declining
for three quarters is a signal that warrants investigation,
regardless of the absolute level.&lt;/p&gt;
&lt;h2&gt;Category 2: Operational characteristics&lt;/h2&gt;
&lt;p&gt;Measurements of the system properties that the architecture is
intended to produce.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fitness function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Data source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment frequency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The median number of production deployments per service per week, across the portfolio.&lt;/td&gt;
&lt;td&gt;CI/CD pipeline logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lead time for change&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The median elapsed time from code commit to production deployment, across the portfolio.&lt;/td&gt;
&lt;td&gt;Source control; deployment pipeline.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mean time to recover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The median time from incident detection to incident resolution, weighted by severity.&lt;/td&gt;
&lt;td&gt;Incident management system.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Change failure rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of production deployments that result in a degraded customer experience or a rollback.&lt;/td&gt;
&lt;td&gt;Deployment pipeline; incident records.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These four are the DORA metrics, well-established and widely
adopted. They are operational measures rather than purely
architectural ones, but the architecture function has substantial
influence over each: deployment frequency is constrained by the
architecture&#39;s ability to be deployed independently, lead time is
constrained by coupling and integration complexity, and so on.&lt;/p&gt;
&lt;p&gt;The architecture function should track these metrics not as a
substitute for the engineering organisation tracking them, but as
a leading indicator of where architectural intervention may be
warranted. A persistent low deployment frequency in a particular
domain is often a symptom of an architectural problem the team
has stopped trying to fix.&lt;/p&gt;
&lt;h2&gt;Category 3: Technical debt and modernisation&lt;/h2&gt;
&lt;p&gt;Measurements of the estate&#39;s evolution toward, or away from, a
modern technical baseline.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fitness function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Data source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;End-of-life exposure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The number of production services running on technology versions that are within twelve months of vendor end-of-life.&lt;/td&gt;
&lt;td&gt;Vulnerability scanning; CMDB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security patch latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The median time between a critical security advisory being published and the patched version being deployed to production, across the portfolio.&lt;/td&gt;
&lt;td&gt;Vulnerability management system.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependency currency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The median age of the dependencies used across the portfolio, relative to the latest stable releases.&lt;/td&gt;
&lt;td&gt;Software bill of materials; dependency scanning.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Modernisation rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of the legacy application portfolio that has been retired, replaced, or substantially modernised in the trailing twelve months.&lt;/td&gt;
&lt;td&gt;Application portfolio; project records.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This category is often the most politically charged. End-of-life
exposure in particular is a measure that frequently surfaces
uncomfortable realities. The discipline is to publish the
measurement, agree the threshold above which intervention is
required, and track progress against it. The measurement itself
does not produce the modernisation; it produces the conversation
that funds the modernisation.&lt;/p&gt;
&lt;h2&gt;Category 4: Cost and resource efficiency&lt;/h2&gt;
&lt;p&gt;Measurements of the architecture&#39;s economic characteristics.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fitness function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Data source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost per business transaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The total infrastructure and operating cost attributed to a defined business transaction (a customer onboarding, an order processed, a report generated), measured monthly.&lt;/td&gt;
&lt;td&gt;Cloud billing; transaction logging.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud utilisation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of provisioned cloud capacity that is meaningfully utilised in a given month.&lt;/td&gt;
&lt;td&gt;Cloud monitoring.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vendor concentration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The number of distinct vendors providing comparable capabilities across the portfolio, and the cost weighting across them.&lt;/td&gt;
&lt;td&gt;Contract register; cost allocation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI workload economics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;For organisations with significant generative AI workloads, the cost per inference, the prompt-cache hit rate, and the percentage of cost attributable to retries.&lt;/td&gt;
&lt;td&gt;LLM gateway logs; cost allocation.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The last of these four is increasingly relevant. Generative AI
workloads have a cost profile that is unusually sensitive to small
architectural decisions — the discipline around prompt caching, the
choice of model for a given task, the design of retrieval — and
these decisions are the architecture function&#39;s territory. A fitness
function focused on AI workload economics provides the visibility
that lets the architecture function intervene before costs become
material.&lt;/p&gt;
&lt;h2&gt;Category 5: Security and compliance posture&lt;/h2&gt;
&lt;p&gt;Measurements of the estate&#39;s security and regulatory standing.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fitness function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Data source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identity coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of production systems integrated with the organisation&#39;s enterprise identity provider, as opposed to maintaining local user accounts.&lt;/td&gt;
&lt;td&gt;Identity provider; CMDB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secrets sprawl&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The number of secrets in the secrets management system, the number found outside it (in environment files, configuration repositories, etc.), and the ratio between the two.&lt;/td&gt;
&lt;td&gt;Secrets scanner; vault audit logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audit log completeness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of production systems producing audit logs that meet the organisation&#39;s published retention and detail requirements.&lt;/td&gt;
&lt;td&gt;Logging infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Privileged access exposure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The number of standing privileged access grants across the production estate, and the proportion of privileged access activity that is just-in-time provisioned.&lt;/td&gt;
&lt;td&gt;Identity provider; PAM solution.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For organisations subject to material regulatory oversight, these
measurements are independently useful as inputs to the regulatory
reporting and audit cycle. The architecture function&#39;s responsibility
here is to set the target, not necessarily to operate the
measurement infrastructure, which generally sits with the security
function.&lt;/p&gt;
&lt;h2&gt;Category 6: Knowledge and decision quality&lt;/h2&gt;
&lt;p&gt;Measurements of the architecture function&#39;s documentation and
decision-making practice itself.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fitness function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Data source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Decision throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The number of Architecture Decision Records authored per quarter, normalised by the size of the architecture function.&lt;/td&gt;
&lt;td&gt;Documentation repository.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Decision lead time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The median elapsed time from a decision being proposed to being formally accepted.&lt;/td&gt;
&lt;td&gt;ADR metadata.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge accessibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The proportion of architecture documentation that has been queried via the internal knowledge system in the trailing month.&lt;/td&gt;
&lt;td&gt;Documentation analytics; LLM assistant logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Onboarding effectiveness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A periodic survey-based measure of how quickly new architects feel productive after joining the function, with a target benchmark.&lt;/td&gt;
&lt;td&gt;Internal survey.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These measurements address the architecture function&#39;s own
operating model, which is rarely measured but is materially
important. An architecture function with a slow decision lead time
becomes the bottleneck it was supposed to alleviate. An architecture
function whose knowledge base is not being consulted is not earning
its keep as a custodian of organisational memory.&lt;/p&gt;
&lt;h2&gt;Implementation considerations&lt;/h2&gt;
&lt;p&gt;Five practical points for organisations adopting this framework.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Begin with a small set.&lt;/strong&gt; Six categories with three to four
metrics each is fifteen to twenty-four measurements. That is too
many to operationalise at once. I would recommend selecting one
metric from each category as the initial set, establishing the
data pipeline and the publication cadence, and then extending
once the practice is established.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Publish the measurements visibly.&lt;/strong&gt; The benefit of fitness
functions accrues from the conversation they generate, not from
the measurement itself. The measurements should be visible to the
leadership team, ideally as a standing item on the relevant
governance forum. A dashboard that exists but is not reviewed has
no effect.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Establish thresholds, not just measurements.&lt;/strong&gt; Each fitness
function should have a target threshold — the level above which
the architecture function considers the property to be in a healthy
state — and a trigger threshold, below which intervention is
required. Without thresholds, the measurements become decorative.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Treat the measurements as inputs to decisions, not as performance
indicators of individuals.&lt;/strong&gt; The temptation to use fitness functions
as performance management indicators for engineers or architects
should be resisted. The measurements are diagnostic; they identify
where the architecture needs attention, not who is to blame for it
needing attention. Using them as individual performance metrics
will produce the predictable behavioural distortions and will
degrade the quality of the measurement over time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Review the framework annually.&lt;/strong&gt; The set of fitness functions
that matters to an organisation evolves as the organisation
evolves. A measurement that was critical eighteen months ago may
have served its purpose. A new measurement may now be needed.
The architecture leadership should review the framework on a
defined cadence, retiring measurements that have ceased to provide
value and adding new ones as needed.&lt;/p&gt;
&lt;h2&gt;Implications for architecture leaders&lt;/h2&gt;
&lt;p&gt;The broader implication of adopting a fitness functions framework
is that the architecture function moves from a function defined by
its deliverables — the artefacts it produces — to a function
defined by its measurable outcomes. This is, in my view, a
necessary evolution for the discipline.&lt;/p&gt;
&lt;p&gt;The architecture function that can demonstrate, with data, that
the estate&#39;s standards conformance is improving, that technical
debt is being addressed at a defined rate, that cloud utilisation
is rising and unit costs are falling, that the security posture
is strengthening, and that the function&#39;s own decision throughput
is healthy, has a fundamentally different conversation with the
executive team than the function that produces an annual capability
model refresh and a target operating model that nobody reads.&lt;/p&gt;
&lt;p&gt;For architecture leaders considering this shift, the recommendation
is to begin small, to publish openly, and to allow the measurements
to drive the conversation rather than to dictate the conclusions.
The framework above is one starting point. The work of adapting
it to a specific organisational context is itself a useful exercise
in articulating what the architecture function is for.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The evolving role of architecture decision records in the age of generative AI</title>
    <link href="https://tarun.bulchandanis.com/blog/architecture-decision-records-generative-ai/"/>
    <id>https://tarun.bulchandanis.com/blog/architecture-decision-records-generative-ai/</id>
    <updated>2026-06-25T00:00:00.000Z</updated>
    <published>2026-06-25T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Architecture Decision Records have re-emerged as one of the most useful artefacts in the modern enterprise architecture practice. Generative AI has not replaced the discipline; it has changed its economics and, in doing so, its place in the operating model. A practical perspective for architecture leaders.</summary>
    <content type="html">&lt;h2&gt;Executive summary&lt;/h2&gt;
&lt;p&gt;Architecture Decision Records — short, structured documents capturing
why a particular design choice was made, what was considered, and
what was rejected — have been an established practice for the better
part of a decade. They have become, quietly, one of the most useful
artefacts in any mature architecture function. They are inexpensive
to write, easy to read, and hold their value across personnel
changes in a way that few other architecture deliverables do.&lt;/p&gt;
&lt;p&gt;What has changed in the last eighteen months is the cost structure
of producing them. The combination of generative AI tooling, modern
source-control workflows, and the maturing ecosystem of architecture
linters means that the marginal cost of authoring an ADR has fallen
substantially. This has changed the economics of the practice and,
with it, the right operating model.&lt;/p&gt;
&lt;p&gt;This piece sets out how I have seen ADR practice evolve, the five
considerations I would recommend any architecture leader work through
when reviewing their own approach, and the implications for the
broader architecture function.&lt;/p&gt;
&lt;h2&gt;The state of ADR practice in 2026&lt;/h2&gt;
&lt;p&gt;The ADR pattern, in its simplest form, has not changed since
Michael Nygard&#39;s original write-up in 2011. A record captures a
decision title, the context in which it was made, the decision
itself, the alternatives considered, and the consequences. It is
stored alongside the code or configuration it relates to, typically
in a directory called &lt;code&gt;docs/adr&lt;/code&gt; or equivalent, and is treated as
an immutable artefact: superseded decisions are replaced by new
records that reference the original, not by edits in place.&lt;/p&gt;
&lt;p&gt;What has changed is the surrounding context. Three observations
that frame the rest of this piece.&lt;/p&gt;
&lt;p&gt;First, &lt;strong&gt;the volume of architecture decisions has increased
materially&lt;/strong&gt;. The combination of microservices proliferation, cloud
service sprawl, and the now-routine integration of generative AI
components into enterprise systems has multiplied the number of
choice points at which an ADR is the appropriate response. A
mid-sized enterprise that ten years ago might have produced a
dozen ADRs per year will, in a comparable function today, produce
many times that number.&lt;/p&gt;
&lt;p&gt;Second, &lt;strong&gt;the cost of producing each individual ADR has fallen&lt;/strong&gt;.
Modern coding assistants can draft a reasonable first version from
a short briefing, capable of producing the structural elements
(context, options, decision, consequences) in seconds. The
architect&#39;s time is no longer consumed by the structural drafting;
it is consumed by the substantive review and the judgement calls
that the assistant cannot make.&lt;/p&gt;
&lt;p&gt;Third, &lt;strong&gt;the readership has broadened&lt;/strong&gt;. ADRs were once read
primarily by other architects. The combination of expanded technical
literacy across product and engineering teams, the rise of internal
LLM-based knowledge tooling that surfaces ADRs in response to
natural-language questions, and the broader push for transparency
in technical decision-making means that the audience for an ADR
today extends well beyond the original architecture community.&lt;/p&gt;
&lt;p&gt;The combined effect is a practice that is more valuable than it
has ever been and is being executed at greater scale and lower
unit cost. The risk is not that ADRs become irrelevant. The risk
is that they become voluminous, inconsistently authored, and
poorly governed — the same trap that documentation practices have
fallen into before.&lt;/p&gt;
&lt;h2&gt;Five considerations for an evolving ADR practice&lt;/h2&gt;
&lt;p&gt;In conversations with architecture leaders across industries over
the past year, five considerations recur. Each merits an explicit
position in the architecture function&#39;s operating model.&lt;/p&gt;
&lt;h3&gt;1. The role of the LLM in the drafting process&lt;/h3&gt;
&lt;p&gt;There is no longer a question of whether generative AI tooling will
be involved in ADR authorship. It will. The question is at what
point in the drafting workflow, with what guardrails, and with
what attribution.&lt;/p&gt;
&lt;p&gt;A workable pattern I have seen in practice is the following. The
architect, or the engineer making the decision, briefs the LLM on
the context — typically a short paragraph or a set of bullet points
describing what is being decided and why. The LLM produces a first
draft. The draft is reviewed, edited substantively, and signed off
by a named human author. The record is committed to source control
with a co-authorship attribution that makes the AI involvement
explicit. The reviewer in a subsequent ADR review process is aware
that the document originated as an AI draft.&lt;/p&gt;
&lt;p&gt;The pitfalls to avoid in this pattern are familiar. The first is
under-editing — accepting the draft as written when it would not
have been accepted from a human contributor. The second is the
opposite: over-editing, which loses the time efficiency the
assistance was meant to provide. The third is the loss of decision
provenance, where the rationale captured in the ADR is the assistant&#39;s
plausible synthesis rather than the actual reasoning of the
decision-maker. All three are addressable through review discipline,
but they require explicit attention.&lt;/p&gt;
&lt;h3&gt;2. The standard template&lt;/h3&gt;
&lt;p&gt;Many organisations have evolved their ADR template over time, often
adding fields specific to their context. In an AI-assisted authorship
model, the consistency of the template matters more than it did,
because the assistant performs significantly better when working to
a well-defined structure.&lt;/p&gt;
&lt;p&gt;I would recommend any architecture leader formally codify the
organisation&#39;s ADR template, publish it as a Markdown file in the
documentation repository, and ensure that the prompt used to brief
the AI assistant references that template explicitly. A template
that includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Title and unique identifier&lt;/li&gt;
&lt;li&gt;Status (Proposed, Accepted, Superseded, Deprecated)&lt;/li&gt;
&lt;li&gt;Date and author(s)&lt;/li&gt;
&lt;li&gt;Context (the situation requiring the decision)&lt;/li&gt;
&lt;li&gt;Decision (the position taken)&lt;/li&gt;
&lt;li&gt;Options considered (with a brief assessment of each)&lt;/li&gt;
&lt;li&gt;Consequences (positive and negative)&lt;/li&gt;
&lt;li&gt;Related records (links to relevant prior ADRs)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;provides sufficient structure for an LLM to produce useful drafts,
and is parseable by tooling for downstream uses such as automated
dashboards or compliance reporting.&lt;/p&gt;
&lt;h3&gt;3. The review and approval workflow&lt;/h3&gt;
&lt;p&gt;The review process is where the practice most often breaks down at
scale. Two common patterns are worth being explicit about.&lt;/p&gt;
&lt;p&gt;The first is &lt;strong&gt;decision-level review&lt;/strong&gt;, where every ADR is reviewed
by an Architecture Governance Board or equivalent. This works well
at low volumes but becomes a bottleneck at scale. It also tends to
shift the centre of gravity in the practice from &amp;quot;documenting
decisions made by the team&amp;quot; to &amp;quot;decisions made by the AGB and
documented retrospectively&amp;quot;, which is a subtle but meaningful
inversion.&lt;/p&gt;
&lt;p&gt;The second is &lt;strong&gt;categorical review&lt;/strong&gt;, where ADRs are classified at
authorship into tiers — typically something like &amp;quot;team-level&amp;quot;
(merged with code review), &amp;quot;domain-level&amp;quot; (reviewed by the relevant
domain architect), and &amp;quot;enterprise-level&amp;quot; (reviewed by the AGB).
This pattern scales more comfortably and preserves the principle
that the team closest to the decision is the one capturing it. It
requires a clear classification rubric, which should itself be
published.&lt;/p&gt;
&lt;p&gt;For organisations adopting AI-assisted authorship at scale, I
would expect categorical review to become the more common pattern.
The volume of ADRs that the function will need to handle is
unlikely to be sustainable under decision-level review.&lt;/p&gt;
&lt;h3&gt;4. ADRs as a queryable corpus&lt;/h3&gt;
&lt;p&gt;Once an organisation has a meaningful body of ADRs in source
control — say, a hundred or more across the architecture function
— the corpus itself becomes a valuable asset. A common use case
emerging in practice is the integration of the ADR corpus with an
internal LLM-based assistant, allowing architects and engineers to
ask natural-language questions such as &amp;quot;what is our position on
microservices boundaries for transaction-processing workloads&amp;quot; or
&amp;quot;have we previously rejected the use of a particular technology,
and on what grounds&amp;quot;.&lt;/p&gt;
&lt;p&gt;This use case is straightforward to implement with current
retrieval-augmented generation tooling. The architectural
requirements are modest: a consistent metadata schema across ADRs,
a well-defined storage layout, an indexing pipeline (which can be
as simple as a nightly job), and an integration point with the
organisation&#39;s preferred internal assistant platform.&lt;/p&gt;
&lt;p&gt;The benefit, in my observation, is twofold. First, decisions are
re-applied consistently across teams, reducing the rate at which
the same question is debated repeatedly in different contexts.
Second, new architects joining the function have a much faster
on-ramp to the organisation&#39;s established positions, which would
otherwise be tribal knowledge.&lt;/p&gt;
&lt;h3&gt;5. The retention and supersession policy&lt;/h3&gt;
&lt;p&gt;ADRs are immutable, but they are not eternal. An ADR that
established the organisation&#39;s position on a now-obsolete
technology is a historical record, not a current standard. The
architecture function needs a clear policy on supersession — when
a new ADR supersedes an older one, the older record is updated
with a status change to &amp;quot;Superseded by ADR-NNN&amp;quot; but is not deleted.&lt;/p&gt;
&lt;p&gt;Less commonly discussed but equally important is the retention
policy for ADRs that have been superseded for a long time. The
straightforward answer is that ADRs are kept indefinitely. Source
control storage is inexpensive, and the historical value of being
able to trace the evolution of the organisation&#39;s architecture
position over time is meaningful, particularly in regulated
contexts where post-hoc audit may require reconstructing the
reasoning behind a decision made some years prior.&lt;/p&gt;
&lt;p&gt;The architecture function should publish its supersession and
retention policy as part of the ADR practice documentation.&lt;/p&gt;
&lt;h2&gt;Implications for the architecture function&lt;/h2&gt;
&lt;p&gt;Three broader implications for architecture leaders.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The architecture function&#39;s documentation discipline is becoming
a competitive advantage.&lt;/strong&gt; In a context where decision velocity is
high and the cost of capturing decisions has fallen, the
organisations that have established robust documentation practices
will accumulate a strategic asset over time — a queryable record
of why their architecture is what it is. The organisations that
have not will increasingly struggle with consistency, with
on-boarding, and with the regulatory expectations that are
emerging in several sectors.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The skills profile of architects is shifting.&lt;/strong&gt; The skill of
producing a clean ADR draft has been substantially commoditised
by AI tooling. The skill of recognising when an ADR is required,
of asking the right questions to surface the actual decision
context, and of facilitating a substantive review discussion has
become correspondingly more important. Architecture leaders
should consider this shift in their hiring and development plans.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The relationship between the architecture function and
engineering teams is changing.&lt;/strong&gt; ADRs that are written by engineering
teams with AI assistance, reviewed by domain architects, and
escalated to the AGB only for genuinely cross-cutting decisions,
represent a meaningful redistribution of architectural authorship
across the organisation. This is, in my view, a positive evolution.
The architecture function&#39;s role becomes more about facilitation
and curation than about authorship, which is a better use of senior
architectural expertise.&lt;/p&gt;
&lt;h2&gt;A note on related practice&lt;/h2&gt;
&lt;p&gt;Architecture Decision Records sit alongside several other
documentation practices that have evolved in parallel over the past
decade — capability models, value stream maps, technology radars,
and the broader category of internal technical writing. The
considerations above apply, with appropriate adjustments, to each
of these. The opportunity for architecture leaders is to take a
coherent view across the whole documentation practice rather than
treating ADRs as a standalone discipline.&lt;/p&gt;
&lt;p&gt;The pieces that follow in this series will examine that broader
practice — including the role of fitness functions in measuring
architectural health, and the changing shape of identity and
security architecture — through the same lens of considered,
practical evolution.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>What an acquisition-heavy company actually needs from its architects</title>
    <link href="https://tarun.bulchandanis.com/blog/acquisition-heavy-architecture/"/>
    <id>https://tarun.bulchandanis.com/blog/acquisition-heavy-architecture/</id>
    <updated>2026-06-18T00:00:00.000Z</updated>
    <published>2026-06-18T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Textbook enterprise architecture practice is calibrated for stable enterprises. It doesn&#39;t fit roll-up companies acquiring three to ten businesses per year. A working framework for the architecture function inside a PE-style acquisition machine.</summary>
    <content type="html">&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;There is a particular kind of company that the standard enterprise
architecture playbook does not fit. Private-equity-backed roll-ups
and other acquisition-heavy growth businesses acquire several small
companies per year and integrate them at varying degrees of depth.
The textbook EA function — the operating model, the Architecture
Governance Board, the capability model, the standards forum — is
calibrated for a stable enterprise where the underlying portfolio
changes slowly. In a roll-up, the portfolio changes by a third every
two years. The textbook function spends its first eighteen months
catching up to the perimeter and then never gets ahead of it.&lt;/p&gt;
&lt;p&gt;This piece is what I think the architecture function actually owns
in an acquisition-heavy business, what it deliberately does not own,
and a working framework for the part it does own. It is opinionated
because the pattern is under-discussed; I have not seen good public
writing on it and I have made most of the mistakes I describe.&lt;/p&gt;
&lt;p&gt;The argument in one sentence: in an acquisition-heavy company, the
architecture function&#39;s job is not to enforce a target end-state.
It is to make the next acquisition cheaper than the last one, the
one after cheaper still, and the seventh one cheap enough that
nobody asks architecture for permission.&lt;/p&gt;
&lt;h2&gt;The pattern&lt;/h2&gt;
&lt;p&gt;An acquisition-heavy business is one where the company acquires
three to ten other businesses per year, each one small relative to
the parent, and integrates them at some level. The acquisitions
might be in the same business vertical (the classic roll-up: many
small dental practices, many small accounting firms, many small
HVAC companies, many small renewable-energy operators) or in
adjacent verticals (a platform business absorbing complementary
capabilities). The economic logic is different in each case but
the architecture problem is similar.&lt;/p&gt;
&lt;p&gt;What is similar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The parent company starts with a single set of systems, usually
decent quality, often standardised.&lt;/li&gt;
&lt;li&gt;Each acquisition arrives with its own stack. The stack is almost
always a mix of: one or two SaaS products the team genuinely
needs, several SaaS products they tolerate, a couple of internal
applications that nobody understands, and a thick layer of
spreadsheets that turn out to be load-bearing.&lt;/li&gt;
&lt;li&gt;The acquisition closes legally before the technology has been
fully understood, let alone integrated.&lt;/li&gt;
&lt;li&gt;Operating leverage from the acquisition depends on integration
happening, but the timeline pressure is intense and the
integration team is small.&lt;/li&gt;
&lt;li&gt;The next acquisition arrives before the current one is fully
integrated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That last bullet is the one that makes this distinct from
single-large-acquisition integration work. A merger of equals or a
single-bet acquisition is a one-time event with a defined end. A
roll-up is a continuous flow. The architecture function is
running a pipeline, not delivering a project.&lt;/p&gt;
&lt;h2&gt;Why textbook EA doesn&#39;t fit&lt;/h2&gt;
&lt;p&gt;A standard enterprise architecture function builds toward a steady
state. It defines the target operating model, populates the
capability model, sets standards, and runs a governance board that
keeps drift in check over time. The model assumes a portfolio that
is mostly stable and gradually evolved.&lt;/p&gt;
&lt;p&gt;This doesn&#39;t survive contact with continuous acquisition. Three
things break:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The target end-state assumption breaks.&lt;/strong&gt; Defining a target end
state requires that the inputs to the model are stable enough to
plan against. They are not. Six months into a target-state
definition, two new acquisitions have arrived with stacks that
weren&#39;t in the model. The target state has to be re-cut. After
three or four re-cuts, the team gives up on target-state work
and the operating model becomes &amp;quot;react to what&#39;s in front of us&amp;quot;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The standards body breaks.&lt;/strong&gt; Standards are useful when they apply
to a long-lived portfolio. They are less useful when half the
portfolio has been part of the company for less than two years and
came in with a stack the standards never anticipated. The Architecture
Governance Board ends up either approving everything (rubber stamp)
or refusing everything (bottleneck). Neither produces useful
governance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The capability model breaks.&lt;/strong&gt; The capability model is supposed
to be a stable spine. Acquired businesses have capabilities that
overlap with the parent&#39;s but at different boundaries, different
granularities, different ownership patterns. Trying to force the
acquired business&#39;s capability map onto the parent&#39;s is the kind
of work that produces three months of consultancy fees and zero
business value. The opposite — keeping every acquisition&#39;s model
separate — produces a sprawl with no integration story.&lt;/p&gt;
&lt;p&gt;The textbook function isn&#39;t wrong; it is calibrated for a different
problem. The problem an acquisition-heavy company actually has is
not &amp;quot;design the target state&amp;quot;. It is &amp;quot;make the integration of
the next acquisition cheap, fast, and reversible&amp;quot;.&lt;/p&gt;
&lt;h2&gt;What architecture actually owns in this context&lt;/h2&gt;
&lt;p&gt;The architecture function in an acquisition-heavy business owns
three specific things and explicitly does not own several others.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Owns: the integration pattern.&lt;/strong&gt; The repeatable playbook for how
an acquired business connects to the parent&#39;s data, systems, and
processes. Not a one-off project plan; a pattern that can be
parameterised for each acquisition. The pattern includes the
sequence (what happens in days 30, 60, 90, 180), the technical
decisions (which acquired systems are kept, which are sunset, which
are migrated), the governance touchpoints (which integration decisions
require AGB sign-off and which are routine), and the rollback
posture (what to do if the integration uncovers problems that
weren&#39;t in due diligence).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Owns: the federated data model.&lt;/strong&gt; A small, deliberately limited
data model at the parent level that lets the parent&#39;s systems talk
about acquired businesses without requiring every acquired business
to conform to a single master schema. The model captures the
minimum data the parent needs (legal entity, business unit, revenue
attribution, customer-of-record, employee count, financial
consolidation point) and lets each acquired business retain its
own operational schemas underneath. The architecture function is
the keeper of the federation, not the harmoniser.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Owns: the vendor consolidation calendar.&lt;/strong&gt; The acquired businesses
arrive with overlapping vendor relationships. The parent has its
own. The right answer is rarely &amp;quot;consolidate everything to the
parent&#39;s vendors on day one&amp;quot;; the right answer is a calendar that
sequences vendor consolidation against renewal dates, integration
priorities, and the architecture function&#39;s own bandwidth. The
calendar is a living artefact, updated each quarter.&lt;/p&gt;
&lt;p&gt;What architecture explicitly does not own in this context:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Does not own: the target operating model.&lt;/strong&gt; That belongs to the
operating partners and the COO. Architecture is a contributor, not
the owner. Target operating models in roll-ups are political
artefacts as much as technical ones; the architecture function does
not have the authority or the visibility to drive them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Does not own: the integration project plan for any specific
acquisition.&lt;/strong&gt; That belongs to the integration project lead, who
is usually a programme manager attached to the M&amp;amp;A function.
Architecture provides the pattern; it does not execute every
instance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Does not own: the capability model in any strong sense.&lt;/strong&gt; The
capability model in an acquisition-heavy company is a useful
sketch, not a source of truth. Investing heavily in it produces
diminishing returns. Keep it lightweight.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Does not own: standards enforcement on acquired companies during
their first year.&lt;/strong&gt; This is the most counter-cultural part of the
framework. Acquired companies should be left alone for their first
year on standards questions, with very specific exceptions
(security, data residency, regulatory compliance, identity
integration). The exceptions matter; the rest does not. The
architecture function that tries to enforce its standards on a
just-acquired business in month three loses both the trust of the
acquired team and the political capital it needs for the integration
that does matter.&lt;/p&gt;
&lt;h2&gt;The 30-60-90-180 pattern&lt;/h2&gt;
&lt;p&gt;A repeatable acquisition-integration playbook, at the architecture
layer. The numbers are days from legal close.&lt;/p&gt;
&lt;h3&gt;Days 1–30: Discover&lt;/h3&gt;
&lt;p&gt;What happens at the architecture layer: a 30-day discovery exercise
on the acquired stack. The output is a written assessment that goes
into the integration team&#39;s hands.&lt;/p&gt;
&lt;p&gt;Specific things the discovery covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Identity and access.&lt;/strong&gt; What identity provider does the acquired
business use? How many user accounts? Where are admin credentials
held? What is the offboarding process for departing staff?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data inventory.&lt;/strong&gt; What datasets does the acquired business
produce, and which are now joint-ownership with the parent under
the acquisition agreement? Where is this data physically stored?
Is any of it subject to regulatory or contractual constraints?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Application portfolio.&lt;/strong&gt; What applications are in production?
Who uses each? Which are SaaS, which are self-hosted, which are
custom-built and unmaintained? Which have active vendor support?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vendor relationships.&lt;/strong&gt; What contracts exist? When are renewals?
What are the costs? Are any of the contracts inherited from a
previous owner that we don&#39;t have visibility into?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The shadow stack.&lt;/strong&gt; The spreadsheets, the scripts, the
developer-laptop tools that aren&#39;t on the official inventory but
are load-bearing. These are usually 30% of the actual operation
and 0% of the documented one.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The deliverable at day 30 is a written assessment, not a
recommendation. The recommendation comes after the integration team
has had its parallel commercial and operational reviews.&lt;/p&gt;
&lt;h3&gt;Days 30–60: Decide&lt;/h3&gt;
&lt;p&gt;What happens at the architecture layer: a sequenced decision on
the acquired stack, with specific categorisation of each application.&lt;/p&gt;
&lt;p&gt;The decision categories I have found useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Keep, integrate.&lt;/strong&gt; The acquired application is good enough that
it stays, but it needs to be integrated with the parent&#39;s identity,
data, and operational tooling. Most operational applications fall
here.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep, federate.&lt;/strong&gt; The acquired application stays as a separate
island, with a thin integration to the parent&#39;s federated data
model but no deeper integration. Often the right answer for
vertical-specific applications where the acquired business knows
its own domain better than the parent does.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Migrate.&lt;/strong&gt; The acquired application is going away; users
migrate to the parent&#39;s equivalent over a defined timeline. The
parent&#39;s application must be capable enough to absorb the acquired
use case.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sunset.&lt;/strong&gt; The acquired application is going away with no direct
replacement because it was solving a problem the parent solves
differently or doesn&#39;t have. Usually a small minority of cases.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conditional.&lt;/strong&gt; The application is on probation. Decision deferred
for six months while the integration team learns whether the
underlying business process needs to change.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each application in the acquired stack gets one of these labels.
The label drives the integration plan. The labels are formally
signed off at the AGB.&lt;/p&gt;
&lt;h3&gt;Days 60–90: Integrate the non-negotiables&lt;/h3&gt;
&lt;p&gt;What happens at the architecture layer: identity, data residency,
security baseline, financial consolidation. The minimum integration
required to run the acquired business as a subsidiary.&lt;/p&gt;
&lt;p&gt;This is the part of integration that does not wait for the full
plan. Identity has to be unified within ninety days for offboarding
to work. Security baseline has to be hit so the acquired company
isn&#39;t a hole in the parent&#39;s posture. Financial consolidation has
to be wired so that month-end works. These are non-negotiable and
the timeline is tight.&lt;/p&gt;
&lt;p&gt;The application-level integrations from the day-30 decisions come
later. Days 60–90 is about the floor, not the ceiling.&lt;/p&gt;
&lt;h3&gt;Days 90–180: Execute the application plan&lt;/h3&gt;
&lt;p&gt;What happens at the architecture layer: the categorised application
decisions from day 60 are now project work. The architecture
function transitions from designer to consultant: the integration
team executes the plan; architecture is on call for the decisions
that come up during execution.&lt;/p&gt;
&lt;p&gt;By day 180, most of the application-level integration work is
either complete or has a clear plan with named owners and timelines.
The acquired business is operating as a normal subsidiary, with
overlapping reporting lines into the parent, with the parent&#39;s
identity and security in place, and with a clear roadmap for the
remaining work.&lt;/p&gt;
&lt;p&gt;Past day 180, the acquired business should be unremarkable from
an architecture perspective. The function&#39;s attention should be
on the next acquisition, not the previous one.&lt;/p&gt;
&lt;h2&gt;Vendor consolidation: the actual mechanic&lt;/h2&gt;
&lt;p&gt;The single highest-value piece of work the architecture function
does in an acquisition-heavy business is vendor consolidation.
Not capability modelling, not target-state architecture, not
standards forums. Vendor consolidation. Because every acquired
business arrives with vendor contracts the parent is now paying
for, and the durable savings — the actual EBITDA contribution from
the M&amp;amp;A work — come from rationalising those contracts.&lt;/p&gt;
&lt;p&gt;The mechanic, in detail:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A central register of every contract across every entity.&lt;/strong&gt;
Updated within ninety days of each acquisition. The register
records vendor, product, scope, value, renewal date, contractual
notice period, and the named relationship owner. This sounds
obvious. Most companies do not have it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A renewal calendar that is the input to the consolidation
pipeline.&lt;/strong&gt; Eight weeks before any contract renewal, the
architecture function reviews whether the contract should be
renewed as-is, renegotiated, consolidated with the parent&#39;s
equivalent contract, or terminated. The decision is informed by
the application-level decisions from day 60.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A consolidation pipeline that processes one to three contracts
per quarter, on average.&lt;/strong&gt; Not all at once; in sequence. The
pipeline is sized to the architecture function&#39;s bandwidth, not
to the theoretical number of consolidations available. Trying
to do twelve consolidations in a quarter produces zero
consolidations and a lot of stalled work.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;An EBITDA tracker that reports the saved cost back to the
finance function quarterly.&lt;/strong&gt; This is the political win that
buys the architecture function the credibility to keep doing
the work. Without the tracker, the savings are invisible and
the function&#39;s value is unmeasurable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the company contexts I have seen this work, the architecture
function delivers high-single-digit-percentage operating savings
per year from this work alone, year after year, indefinitely. It
is the most reliable value the function produces. It is also
mostly invisible from the outside, which is fine.&lt;/p&gt;
&lt;h2&gt;The federated data model&lt;/h2&gt;
&lt;p&gt;The structurally hardest piece of architecture work in an
acquisition-heavy business is the data model. Two failure modes
sit on either side of the right answer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Failure mode 1: every acquired company conforms.&lt;/strong&gt; The parent
defines a master schema and every acquired business is required
to migrate onto it. This produces six- to eighteen-month migration
projects per acquisition, none of which deliver business value
during the migration, all of which produce friction with the
acquired teams, and many of which fail outright when the acquired
business&#39;s actual operational needs do not fit the master schema.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Failure mode 2: no integration data model at all.&lt;/strong&gt; The parent
runs its own systems; each acquired company runs its own; nobody
attempts to harmonise. This makes consolidated reporting impossible,
makes financial close take three weeks instead of three days, and
makes any cross-business operational decision require manual
data wrangling.&lt;/p&gt;
&lt;p&gt;The right answer is a deliberately small federated model. A
specific list of data entities that the parent needs to know about
at the corporate level, with a clean schema, and explicit federation
contracts with each acquired business. The acquired businesses keep
their operational schemas; they map them to the federated model on
a defined cadence (daily, weekly, monthly depending on the entity).&lt;/p&gt;
&lt;p&gt;The entities I would include in a minimum federated model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Legal entity.&lt;/strong&gt; Which legal company is each business operating
under. Hierarchy of legal entities. Tax registrations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Customer of record.&lt;/strong&gt; Unique customer identifier across the
group, where the same customer exists in multiple businesses.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Employee.&lt;/strong&gt; Joint employees of any group company, mapped to the
identity provider.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Application.&lt;/strong&gt; What applications exist across the group, with
ownership.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vendor.&lt;/strong&gt; What vendor contracts exist across the group, joined
to applications.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Financial transaction (aggregated).&lt;/strong&gt; Revenue, cost, and
EBITDA at the granularity the CFO needs for consolidation. Not
individual line items.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is roughly six entities. Most attempts at a federated model
go to twenty or thirty entities and collapse under the weight.
Six is enough for ninety percent of the consolidation use cases
and small enough to actually maintain.&lt;/p&gt;
&lt;h2&gt;The cultural component&lt;/h2&gt;
&lt;p&gt;A short and important note. The acquired teams have feelings about
what is happening to them. The integration team — and the
architecture function specifically — is the visible face of the
parent&#39;s decisions about the acquired business&#39;s stack. Those
decisions feel personal because they often involve sunsetting a
system the acquired team built or stopping using a vendor the
acquired team chose.&lt;/p&gt;
&lt;p&gt;The right posture is to listen first, make decisions second, and
communicate decisions transparently. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The day-30 discovery is a real conversation with the acquired
team, not just a read of the documentation. The conversation is
the input to the day-60 decision.&lt;/li&gt;
&lt;li&gt;The day-60 decision is communicated in writing, with reasons,
and is open to challenge in a specified window before it becomes
final. &amp;quot;Keep, federate&amp;quot; and &amp;quot;Conditional&amp;quot; are both used
deliberately because they signal &amp;quot;we are not yet making the
hard decision, and your input matters&amp;quot;. &amp;quot;Sunset&amp;quot; is used
sparingly and always with named replacement.&lt;/li&gt;
&lt;li&gt;The acquired team retains influence over their stack for at least
the first ninety days, even where the parent&#39;s architecture
team has views. The exceptions are the non-negotiables (security,
identity, regulatory) where the parent&#39;s posture is final.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not soft-skills theatre. It is the durable mechanism by
which the integration succeeds. The acquired team&#39;s institutional
knowledge of why their stack looks the way it does is the most
valuable input the architecture function has. The function that
ignores that input makes worse decisions and loses the trust it
needs for the deeper integration work that comes later.&lt;/p&gt;
&lt;h2&gt;What it means for the architecture function&#39;s structure&lt;/h2&gt;
&lt;p&gt;The function in an acquisition-heavy business looks different from
the textbook function.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Smaller standards forum.&lt;/strong&gt; The Architecture Governance Board
meets monthly, not weekly, and focuses on the consequential
decisions: the day-60 categorisations, the federation contracts,
the security exceptions, the genuinely cross-business
architectural choices. Not the everyday-application questions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A dedicated acquisition architecture lead.&lt;/strong&gt; One person whose
full-time job is the integration pattern: shepherding each
acquisition through the 30-60-90-180 cycle. This role is the
thing most acquisition-heavy companies are missing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A vendor management partnership.&lt;/strong&gt; The architecture function
and the procurement function are tightly partnered, not at
arm&#39;s length. The contract register is co-owned. The consolidation
pipeline is co-run.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lightweight capability modelling.&lt;/strong&gt; A simple capability map
exists at the corporate level — fewer than fifty top-level
capabilities — and it is updated quarterly rather than as a
major artefact. It serves as orientation, not as a planning tool.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A clear separation between architecture and integration
execution.&lt;/strong&gt; Architecture is on the design and the pattern;
integration is on the execution. Both functions exist; they are
not the same team.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shape is leaner than the textbook EA function and busier per
head. It is also the shape that actually fits the business model.&lt;/p&gt;
&lt;h2&gt;What this looks like when it is working&lt;/h2&gt;
&lt;p&gt;A short picture of the steady-state, for what to aim at.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each acquisition closes legally, and within 30 days the
architecture function has produced a written assessment of the
acquired stack.&lt;/li&gt;
&lt;li&gt;Within 60 days the AGB has signed off on the application-level
categorisations and the integration plan.&lt;/li&gt;
&lt;li&gt;Within 90 days the non-negotiables (identity, security,
data residency, financial consolidation) are in place.&lt;/li&gt;
&lt;li&gt;Within 180 days the application-level integrations are on a
named roadmap with owners.&lt;/li&gt;
&lt;li&gt;Vendor consolidation produces predictable EBITDA contribution
quarterly.&lt;/li&gt;
&lt;li&gt;The federated data model is stable; the entities in it are
unchanged from one year to the next.&lt;/li&gt;
&lt;li&gt;The architecture function is small relative to the size of the
group, but the integration pattern works repeatably.&lt;/li&gt;
&lt;li&gt;Each acquisition is cheaper to integrate than the last, because
the pattern is mature and the federation is established.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The signal that the function is working is not the absence of
incidents or the elegance of the target-state diagram. It is the
unit economics of integration. If the next acquisition costs less
to integrate than the previous one, the function is doing its
job. If not, the function is failing in some specific way that
the framework above lets you diagnose.&lt;/p&gt;
&lt;h2&gt;Why this isn&#39;t written about&lt;/h2&gt;
&lt;p&gt;The framework is under-discussed because it does not fit the
prestige patterns of enterprise architecture practice. The
prestige patterns — TOGAF certifications, capability-model
deep-dives, the long-form ADRs that get praised on LinkedIn — are
calibrated for stable enterprises. Acquisition-heavy work is more
operational, more about pattern repetition than about elegant
single artefacts, and more about EBITDA than about elegance.&lt;/p&gt;
&lt;p&gt;The people who do it well tend not to write about it. The architecture
press writes about the prestige patterns. The PE-backed roll-up
architects are heads-down doing the integration work and probably
not writing about it because the work itself is what they are
paid for.&lt;/p&gt;
&lt;p&gt;I have been on the inside of this pattern. The framework above is
mine; if it is useful in your context, I would like to hear how
it lands. If you are reading this and you are the architecture
lead inside a roll-up business, please consider writing about the
pattern as well. The discipline benefits from more honest writing
about what actually works.&lt;/p&gt;
&lt;h2&gt;Related&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/blog/case-study-meridian/&quot;&gt;Meridian: building the EA platform we couldn&#39;t buy&lt;/a&gt;
— describes the EA platform that was deliberately designed to
cope with a fast-changing portfolio rather than a stable one.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;CANVAS: building the approval workflow no commercial product
covers&lt;/a&gt; — the workflow that wraps the
acquisition-onboarding gate at the application level.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/blog/ea-tool-market-18-months/&quot;&gt;The commercial EA tool market has 18 months&lt;/a&gt;
— the same shift, viewed from the vendor side.&lt;/li&gt;
&lt;/ul&gt;
</content>
  </entry>
  <entry>
    <title>Sovereign AI is mostly theatre. The actual technical question is data residency</title>
    <link href="https://tarun.bulchandanis.com/blog/sovereign-ai-theatre/"/>
    <id>https://tarun.bulchandanis.com/blog/sovereign-ai-theatre/</id>
    <updated>2026-06-11T00:00:00.000Z</updated>
    <published>2026-06-11T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>European &#39;sovereign AI&#39; announcements are increasingly political. The architecture questions buried inside them are technical and answerable. A framework for separating the two.</summary>
    <content type="html">&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;&amp;quot;Sovereign AI&amp;quot; is the framing every European government and most
European enterprise software vendors are using in 2026 to describe
the goal of running AI infrastructure without dependence on US
hyperscalers. The political case is straightforward. The technical
delivery is much messier than the framing suggests. Most &amp;quot;sovereign
AI&amp;quot; propositions in the market today are either repackaging a US
provider behind a European brand, building a non-frontier-class
model with EU funding, or solving a different problem (data
residency, model provenance, supply chain) and calling the bundle
&amp;quot;sovereignty&amp;quot;.&lt;/p&gt;
&lt;p&gt;For an architect making real decisions in 2026 about where to run
inference, where to store the data that feeds the models, and what
to commit to in vendor contracts, the useful move is to separate
the political question from the technical questions. The political
question is whose flag is on the press release. The technical
questions are three: where does the inference physically run, where
does the data that the model sees physically live, and under whose
jurisdiction is the operating company subject to discovery, subpoena,
or compelled-access orders. Each of these has a clean answer for
a given deployment. None of them are answered by buying a &amp;quot;sovereign
AI&amp;quot; product per se.&lt;/p&gt;
&lt;p&gt;This piece is the framework I would use to make the call, with
specific patterns for the regulated enterprise context.&lt;/p&gt;
&lt;h2&gt;What &amp;quot;sovereign AI&amp;quot; usually means in 2026&lt;/h2&gt;
&lt;p&gt;The category is doing too much work. When a European government,
a hyperscaler, a regional cloud provider, or a vendor uses &amp;quot;sovereign
AI&amp;quot; in 2026, they typically mean one of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A European frontier model.&lt;/strong&gt; Mistral is the canonical example.
Aleph Alpha was an earlier one. The pitch is that the model itself
is European-trained, European-controlled, and competitive with the
US frontier. The reality is that the gap to the US frontier has
been narrowing on some metrics and widening on others, with the
result that &amp;quot;sovereign frontier model&amp;quot; is a credible product for
some workloads and not for others.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A European cloud running US models.&lt;/strong&gt; OVHcloud running Mistral.
Scaleway hosting Llama. The various &amp;quot;EU Bedrock-equivalent&amp;quot;
offerings from European hyperscalers. The pitch here is that the
infrastructure is European even if the model came from elsewhere.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A US hyperscaler running in EU regions with EU contractual
controls.&lt;/strong&gt; Microsoft&#39;s EU Data Boundary, AWS&#39;s European Sovereign
Cloud, Google&#39;s Sovereign Cloud arrangements. The pitch is that
the US provider can demonstrate that data stays in the EU and is
subject to EU contractual controls even though the operating
company is US.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A managed-services wrapper around any of the above.&lt;/strong&gt; The
service provider takes responsibility for &amp;quot;sovereignty&amp;quot; as a
service offering. The customer doesn&#39;t have to design the
controls themselves; they outsource the question to a vendor.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A pure marketing claim with no underlying architecture.&lt;/strong&gt; This
category exists. The vendor uses &amp;quot;sovereign AI&amp;quot; as a positioning
word with no specific technical commitment behind it. Worth being
able to recognise.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The five things are not the same. They have different cost profiles,
different risk profiles, different capability profiles, and different
legal postures. Treating &amp;quot;sovereign AI&amp;quot; as a single category obscures
all of that.&lt;/p&gt;
&lt;h2&gt;The political moment&lt;/h2&gt;
&lt;p&gt;It is worth being honest about why &amp;quot;sovereign AI&amp;quot; is having the
moment it is having.&lt;/p&gt;
&lt;p&gt;A combination of: the EU AI Act compliance regime now in active
enforcement (the second tranche of obligations landed in 2025, the
third in 2026), the broader European tech-sovereignty push that
predates AI, the geopolitical realignment around US-EU technology
relations under the current US administration, the specific
concerns that the Cloud Act creates for any EU data hosted by US
companies, and a cohort of European AI investors who genuinely
believe the next decade of model competition has space for
non-US-aligned alternatives.&lt;/p&gt;
&lt;p&gt;All of this is real and all of it produces real announcements
and real funding flows. Several of the announcements over the
past eighteen months are substantive. Several are political theatre.
The architect&#39;s job is to be able to tell them apart at the
specification level.&lt;/p&gt;
&lt;h2&gt;The three technical questions&lt;/h2&gt;
&lt;p&gt;The political framing — &amp;quot;is this sovereign?&amp;quot; — is not actionable.
The actionable questions are three. Every architecture decision
about AI deployment in a regulated European context can be
decomposed into them.&lt;/p&gt;
&lt;h3&gt;Question 1: Where does the inference physically run?&lt;/h3&gt;
&lt;p&gt;The literal compute. The GPUs. The data centre. The geographic
location of the silicon doing the matrix multiplications when your
prompt is being processed.&lt;/p&gt;
&lt;p&gt;This is the most concrete and most easily auditable of the three
questions. The answer can be a city, a region, or a country. For
EU regulatory purposes it usually needs to be a member state. For
some specific national rules (FCA outsourcing rules, some defence
contracts, some healthcare deployments) it needs to be the
specific country.&lt;/p&gt;
&lt;p&gt;The answer varies by vendor and by deployment configuration:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI API directly:&lt;/strong&gt; US, primarily. EU data residency is
available on enterprise contracts but the underlying inference
may still route through US-based infrastructure for some
workloads. Read the terms.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic API directly:&lt;/strong&gt; US, primarily. Similar enterprise
options.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic via AWS Bedrock:&lt;/strong&gt; Customer-controlled. Pick an EU
region; the inference runs there.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic via Google Vertex AI:&lt;/strong&gt; Customer-controlled. Pick
europe-west4 (Netherlands) or similar; the inference runs there.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure OpenAI:&lt;/strong&gt; Customer-controlled. Pick West Europe or
Sweden Central; the inference runs there.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mistral via Mistral La Plateforme:&lt;/strong&gt; France.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Self-hosted Mistral / Llama / Qwen on European infrastructure:&lt;/strong&gt;
Whichever region you deployed to.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The pattern: the question of where inference runs is a procurement
and configuration question, not a &amp;quot;sovereign vs not&amp;quot; question. You
can have EU-resident inference from US-based vendors. You can also
have non-EU-resident inference from European-branded products if
you don&#39;t configure carefully.&lt;/p&gt;
&lt;p&gt;For most regulated workloads, the answer should be an EU member
state and should be contractually committed. The vendor&#39;s
willingness to commit to this is itself a signal of whether they
are operating at enterprise grade.&lt;/p&gt;
&lt;h3&gt;Question 2: Where does the data the model sees physically live?&lt;/h3&gt;
&lt;p&gt;A different question. The model is doing inference somewhere. The
data the model is reasoning over — the documents in your RAG
corpus, the customer records the agent is acting on, the documents
attached to the prompt — is being sent to that inference endpoint.
The data must travel to where the inference runs. It may also be
cached, logged, or retained somewhere along the way.&lt;/p&gt;
&lt;p&gt;The data-residency question is therefore: across the full
inference path, where does the data live, even transiently?&lt;/p&gt;
&lt;p&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The corpus.&lt;/strong&gt; Where is the vector database? Where are the
source documents? Where are the embeddings stored?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The prompt and response in flight.&lt;/strong&gt; What region does the
network path traverse? Where are the load balancers?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The prompt and response at rest.&lt;/strong&gt; Does the vendor retain
prompts and responses? For how long? In what region?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The model weights, if you are fine-tuning.&lt;/strong&gt; Where are the
training datasets stored, where is the training compute, where
are the resulting weights stored?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a typical RAG-based agent in 2026, the corpus is the part most
companies handle correctly (it lives in the company&#39;s own systems,
usually in a regional managed service) and the inference logs are
the part most companies handle incorrectly (the vendor&#39;s retention
defaults are often longer than the company&#39;s policy, and the
region may be different).&lt;/p&gt;
&lt;p&gt;The audit-grade answer to this question is: every byte of data
that touches the model — going in or coming out — has a specified
region of residence at every stage, and the vendor contract
matches the specification. Where the contract is silent, the
default is whatever the vendor&#39;s general infrastructure does, which
is usually US-routed.&lt;/p&gt;
&lt;h3&gt;Question 3: Under whose jurisdiction is the operating company subject to compelled access?&lt;/h3&gt;
&lt;p&gt;The hardest of the three to reason about because it is about legal
process rather than technical configuration.&lt;/p&gt;
&lt;p&gt;A US company operating a cloud in the EU is, under the Cloud Act,
potentially subject to US compelled-access orders for data the
company holds, regardless of where the data is physically stored.
This is the core argument against US-headquartered hyperscalers
hosting truly sensitive EU data. The US company can be ordered by
a US court to produce the data, and complying with that order may
put them in conflict with EU law (specifically GDPR&#39;s restrictions
on third-country data transfers).&lt;/p&gt;
&lt;p&gt;The mitigations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;EU Data Boundary and equivalent.&lt;/strong&gt; Microsoft, AWS, and Google
have all rolled out variants of contractual and technical
arrangements designed to address Cloud Act concerns. The details
matter. Some of these arrangements are substantive (separate
legal entities incorporated in the EU, with EU staff, EU
encryption keys held by EU entities, technical separation of the
EU infrastructure from US operations). Some are less so. Read
the actual terms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;EU-incorporated operating companies.&lt;/strong&gt; A truly EU-resident
cloud — operated by an EU-incorporated company, with no US parent
in the legal structure, with no US staff with admin access — is
not subject to the Cloud Act. OVHcloud, Scaleway, Aruba, and
the various national cloud initiatives fall in this category.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On-premise deployment.&lt;/strong&gt; The company runs the inference inside
its own data centre, on its own infrastructure, using a model
it has the right to run (typically an open-weights model). No
third-party operator at all. No Cloud Act exposure because no
US company is involved.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The third question is the one where &amp;quot;sovereign AI&amp;quot; framing comes
closest to making sense, because it is genuinely about jurisdictional
sovereignty. But it is also the question with the largest gap
between political claim and technical reality. Most &amp;quot;sovereign AI&amp;quot;
offerings address questions 1 and 2 but punt on question 3.&lt;/p&gt;
&lt;h2&gt;A decision framework&lt;/h2&gt;
&lt;p&gt;For an architect making a deployment decision, the question is
not &amp;quot;is this sovereign&amp;quot;. It is: &amp;quot;for this specific workload, with
this specific sensitivity level, what answers do I need to questions
1, 2, and 3, and which deployment configurations satisfy them?&amp;quot;&lt;/p&gt;
&lt;p&gt;A reasonable framework, in order of escalating constraint:&lt;/p&gt;
&lt;h3&gt;Tier 0: Public, non-sensitive workloads&lt;/h3&gt;
&lt;p&gt;Marketing copy generation. Internal-document summarisation of
non-confidential material. Developer-tooling LLM use against
non-confidential code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Answers needed:&lt;/strong&gt; Q1 and Q2 should be in-region (EU for EU
operations), via contractual commitment. Q3 is generally not a
concern at this tier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Acceptable deployments:&lt;/strong&gt; Any major LLM vendor with an EU
inference region and standard enterprise terms. OpenAI in EU,
Anthropic via Bedrock EU, Mistral, Azure OpenAI in EU, Vertex AI
in EU.&lt;/p&gt;
&lt;h3&gt;Tier 1: Confidential business data&lt;/h3&gt;
&lt;p&gt;Internal architecture documents. Internal financial planning data.
Internal HR data (non-PII). Source code where commercial sensitivity
is moderate.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Answers needed:&lt;/strong&gt; Q1 in-region with contractual commitment. Q2
in-region with contractual commitment, including retention terms
and incident-response data handling. Q3 should be considered: a
US-headquartered hyperscaler running an EU region with strong
contractual controls is acceptable; pure consumer-tier vendor
endpoints are not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Acceptable deployments:&lt;/strong&gt; Azure OpenAI in EU on an enterprise
contract with EU Data Boundary attestation. Anthropic via Bedrock
EU on a similar contract. Mistral La Plateforme for workloads
where the model quality is sufficient. Self-hosted Llama or Mistral
on EU-resident infrastructure for workloads where data-residency
control is paramount.&lt;/p&gt;
&lt;h3&gt;Tier 2: Regulated PII, financial customer data, healthcare records&lt;/h3&gt;
&lt;p&gt;Customer transaction data. Patient records. KYC documentation.
Anything covered by financial-services or healthcare regulation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Answers needed:&lt;/strong&gt; Q1, Q2, and Q3 all need strong answers. Q3
is now actually the binding constraint: the legal exposure to
US compelled access becomes a meaningful concern, depending on
the specific regulator&#39;s view.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Acceptable deployments:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For most workloads, US hyperscaler in EU region with full EU
Data Boundary controls is still acceptable, provided the company
is comfortable with the residual Cloud Act exposure. Most large
EU banks operate at this tier.&lt;/li&gt;
&lt;li&gt;For workloads where Cloud Act exposure is unacceptable: an
EU-incorporated cloud provider running an EU-developed model
(Mistral on OVHcloud, for example), or self-hosted infrastructure
with open-weights models.&lt;/li&gt;
&lt;li&gt;The decision is partly about the workload&#39;s regulatory exposure
and partly about the regulator&#39;s known posture on the question.
The Dutch DNB and the German BaFin have been more conservative
on US-hyperscaler exposure than some others. Calibrate to your
primary regulator.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Tier 3: Defence, national-security-adjacent, certain forms of intelligence&lt;/h3&gt;
&lt;p&gt;Out of scope for most of the readers of this post. The pattern is
on-premise, air-gapped, open-weights, with the entire inference
path under the operating company&#39;s physical control.&lt;/p&gt;
&lt;h2&gt;Where the European model providers actually win&lt;/h2&gt;
&lt;p&gt;A fair assessment of where Mistral and the smaller European players
genuinely deliver value rather than just political framing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On data-residency questions where the customer wants to deal
with an EU-incorporated counterparty rather than a US one.&lt;/strong&gt; Mistral
La Plateforme is operated by a French company. Its contract is
governed by French law. There is no US holding company in the
chain. For Tier 2 workloads where Cloud Act exposure is the
specific concern, this is a real technical answer, not just
positioning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On model quality for non-frontier workloads.&lt;/strong&gt; Mistral&#39;s smaller
models (in 2026, the latest mid-tier Mistral models) are competitive
with US mid-tier models on many evals. For workloads where you do
not need frontier capability, the European model is a viable
choice on capability alone, and the residency story is a bonus.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On open-weights availability for self-hosting.&lt;/strong&gt; Mistral has
been one of the more consistent providers of weights that can
actually be deployed on customer infrastructure. For workloads
where self-hosting is the architectural requirement, the open
weights from Mistral and from Meta&#39;s Llama family are the practical
choice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On specific language strength.&lt;/strong&gt; Some European models have
strength in specific European languages (French, German, Spanish)
that exceeds the major US models. For workloads where the working
language is something other than English, this is occasionally
a meaningful capability difference.&lt;/p&gt;
&lt;h2&gt;Where they don&#39;t matter&lt;/h2&gt;
&lt;p&gt;A fair assessment of where the European positioning is mostly
marketing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On frontier-class capability.&lt;/strong&gt; As of mid-2026, the frontier
of capability — the most demanding agentic reasoning, the most
complex tool use, the most reliable code generation — sits with
the major US frontier models. The gap has been narrowing on some
benchmarks and not narrowing on others. If your workload genuinely
requires frontier capability, the European alternatives are not
yet substitutable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On managed enterprise tooling around the model.&lt;/strong&gt; The vendor
ecosystem around the major US providers — observability tools,
prompt-management platforms, evaluation frameworks, deployment
patterns, integration platforms — is significantly more mature
than the equivalent ecosystem around European providers. This
matters for production deployments at scale.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On Q3 if the workload&#39;s regulatory exposure is low.&lt;/strong&gt; For most
Tier 0 and Tier 1 workloads, the Cloud Act concern is theoretical.
Choosing a European provider specifically to address Q3 when Q3
is not actually a binding constraint is over-engineering.&lt;/p&gt;
&lt;h2&gt;A concrete recommendation&lt;/h2&gt;
&lt;p&gt;If I were the Chief Architect of a regulated European company today,
making the call about AI infrastructure for the next two years,
here is what I would actually do.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For Tier 0 workloads:&lt;/strong&gt; Standardise on whichever frontier model
the company has the strongest enterprise relationship with, deployed
in an EU region with standard contractual residency commitments.
Do not over-engineer the sovereignty question.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For Tier 1 workloads:&lt;/strong&gt; Same as Tier 0, with stronger contractual
commitments on data handling, retention, and incident response.
The Bedrock or Vertex AI path with explicit region selection. Do
not run consumer-tier endpoints.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For Tier 2 workloads with Cloud Act exposure as a binding
constraint:&lt;/strong&gt; Mistral via Mistral La Plateforme, or self-hosted
open-weights model on an EU-incorporated cloud. Accept the
capability trade-off; for these workloads, the regulatory clarity
is worth more than the marginal model quality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For Tier 2 workloads where Cloud Act exposure is acceptable to
the regulator:&lt;/strong&gt; US hyperscaler in EU region with strong contractual
controls (EU Data Boundary, equivalent). Most of the EU regulated
sector is operating at this tier today and it is workable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For Tier 3 workloads:&lt;/strong&gt; Out of scope for this piece. Talk to your
sector-specific authority.&lt;/p&gt;
&lt;p&gt;The thing I would not do is pick a &amp;quot;sovereign AI&amp;quot; product because
the brand says sovereign. Read the contract, check the inference
region, check the data-residency commitments, check the operating
company&#39;s jurisdiction. Then make the call on the merits.&lt;/p&gt;
&lt;h2&gt;The next two years&lt;/h2&gt;
&lt;p&gt;A prediction. The &amp;quot;sovereign AI&amp;quot; framing peaks in 2026 and 2027,
then transitions into a more technical conversation about
data-residency engineering as the political moment passes and the
real architecture decisions get made. The vendors that survive
this transition are the ones whose technical claims hold up — not
the ones with the best press release.&lt;/p&gt;
&lt;p&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mistral becomes a credible mid-tier European choice for
regulated workloads.&lt;/strong&gt; Its frontier-model ambitions either succeed
or stall; either way, the mid-tier business is durable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The US hyperscalers&#39; EU Data Boundary arrangements get
contractually stronger as their European customer base demands
more.&lt;/strong&gt; Microsoft, AWS, and Google all add layers of EU-control
through 2026 and 2027. By 2028 these arrangements look much more
like proper sovereign clouds for most practical purposes,
although the Cloud Act exposure does not fully go away.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open-weights models become the default for the most sensitive
workloads.&lt;/strong&gt; Self-hosting on EU infrastructure is the answer for
the cases where Q3 is the binding constraint. The ecosystem
around self-hosting (deployment patterns, observability,
evaluation, fine-tuning) matures rapidly through 2026 and 2027.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The political framing fades.&lt;/strong&gt; By 2028 &amp;quot;sovereign AI&amp;quot; is no
longer the headline framing. The conversation is about specific
technical commitments, just as the conversation about &amp;quot;cloud
sovereignty&amp;quot; — which had its own moment in 2018-2020 — became
technical rather than political over time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The architects who do well in the next two years are the ones who
stay grounded in the three technical questions and don&#39;t get
distracted by the framing. The political conversation will resolve
itself. The technical decisions will be on the architecture
function&#39;s books long after.&lt;/p&gt;
&lt;p&gt;If you are reading vendor proposals right now with &amp;quot;sovereign AI&amp;quot;
in the title and you have not separately verified the answers to
questions 1, 2, and 3 for the specific deployment configuration
being proposed, please go back and do that before you sign.&lt;/p&gt;
&lt;h2&gt;Related&lt;/h2&gt;
&lt;p&gt;This piece sits adjacent to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;How do you audit a decision an agent made? A working framework.&lt;/a&gt;
— The audit story for agent decisions is closely related to the
data-residency story; both are about where evidence lives and
whose jurisdiction it lives under.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/blog/mcp-enterprise-standard/&quot;&gt;MCP is the most important enterprise standard nobody is implementing.&lt;/a&gt;
— The integration layer through which inference happens; data
residency cuts across this directly.&lt;/li&gt;
&lt;/ul&gt;
</content>
  </entry>
  <entry>
    <title>MCP is the most important enterprise standard nobody is implementing</title>
    <link href="https://tarun.bulchandanis.com/blog/mcp-enterprise-standard/"/>
    <id>https://tarun.bulchandanis.com/blog/mcp-enterprise-standard/</id>
    <updated>2026-06-04T00:00:00.000Z</updated>
    <published>2026-06-04T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Model Context Protocol is eighteen months old, supported by every major model vendor, and the cleanest answer to the integration sprawl that AI agents are creating. Enterprise adoption is poor. Here is why, and what to do about it.</summary>
    <content type="html">&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;Anthropic introduced the Model Context Protocol in November 2024.
By mid-2026 it has been adopted, in some form, by OpenAI, Google,
the major IDE vendors (Cursor, Claude Code, VS Code via Copilot
Chat), the major productivity vendors (Notion, Linear, Atlassian,
Microsoft via M365 Copilot extensions), and a long tail of
infrastructure providers. The protocol does one specific thing
well: it standardises how an LLM gets access to external tools,
context, and data, with a security and discovery model that
generalises across vendors.&lt;/p&gt;
&lt;p&gt;In consumer and developer-tooling contexts, MCP has won. In
enterprise contexts — by which I mean regulated companies running
agents in production over internal data — MCP adoption is poor.
Most internal agent builds I see in 2026 are still using
custom-per-vendor function calling, hand-rolled RAG pipelines,
direct API integrations with each downstream system, or some
proprietary middleware layer that the architecture function is
trying not to think about.&lt;/p&gt;
&lt;p&gt;This is a missed opportunity. MCP is the right abstraction. It
is more secure than the bespoke alternatives most companies are
running. It removes a category of vendor lock-in. It standardises
the audit story for tool calls. And the cost of adopting it is
low and falling.&lt;/p&gt;
&lt;p&gt;This piece is what MCP is, why enterprise adoption has lagged, the
arguments against and the answers to them, and three integration
patterns I would use today for a regulated enterprise.&lt;/p&gt;
&lt;h2&gt;What MCP is, in one paragraph&lt;/h2&gt;
&lt;p&gt;Model Context Protocol is an open protocol — JSON-RPC over a few
transport options, with a small set of standardised methods — that
defines how an AI agent (the &amp;quot;client&amp;quot;) connects to an external
data or capability provider (the &amp;quot;server&amp;quot;). The server exposes a
discoverable list of &lt;strong&gt;tools&lt;/strong&gt; (callable functions), &lt;strong&gt;resources&lt;/strong&gt;
(retrievable content), and optionally &lt;strong&gt;prompts&lt;/strong&gt; (reusable
templates). The client connects, discovers what is available,
and calls into the server during the model&#39;s reasoning loop.
The protocol handles authentication, capability discovery,
streaming, error handling, and — critically — the audit-relevant
metadata around every call.&lt;/p&gt;
&lt;p&gt;In effect, MCP is to AI tooling what LSP (the Language Server
Protocol) was to IDE tooling. LSP let any editor talk to any
language&#39;s tooling without each editor re-implementing the
intelligence for every language. MCP lets any model talk to any
data source or tool without each model vendor re-implementing
the connector for every system.&lt;/p&gt;
&lt;h2&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;The status quo without MCP is well-known to anyone who has built
an agent against multiple downstream systems. A typical internal
agent today reaches into Confluence for documentation, into Jira
for tickets, into Salesforce for customer records, into the
company data warehouse for analytics, into the CI system for
build status, into an internal directory for people lookups,
and into half a dozen smaller internal systems. Each of those
integrations is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A bespoke OpenAPI-driven tool definition wrapped in a vendor-specific
function-calling format (Anthropic&#39;s, OpenAI&#39;s, Google&#39;s — all
similar but not identical).&lt;/li&gt;
&lt;li&gt;A retrieval pipeline that ingests data from the source, embeds it,
stores it in a vector database, and surfaces it through RAG.&lt;/li&gt;
&lt;li&gt;A direct API call from the agent code, with the model&#39;s reasoning
injecting parameters into the call.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these is brittle in its own way. Bespoke tool definitions
break when the agent moves to a different model vendor; you re-write
the wrappers. RAG pipelines drift from the source data and require
re-ingestion every time the source schema changes; they also lose
the fine-grained permissions model of the source system. Direct
API calls couple the agent code tightly to the downstream system&#39;s
API surface; you re-write the integration every time the API changes.&lt;/p&gt;
&lt;p&gt;MCP gives you a different shape. The downstream system exposes an
MCP server. The agent, regardless of model vendor, talks to that
server through the protocol. The server handles permissions, the
discovery, the streaming, the structured responses. The agent code
doesn&#39;t change when you switch models. The integration doesn&#39;t
change when you switch agents. The downstream system&#39;s API can
evolve independently of the agent layer.&lt;/p&gt;
&lt;p&gt;There are several specific implementation wins:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fine-grained permission delegation.&lt;/strong&gt; MCP servers receive the
end-user&#39;s identity (via OAuth-style flows where the server is
the resource server and the model client is the bearer). The
permissions the server applies are the permissions of the actual
user. This is the correct security model for an enterprise tool;
the alternative is service-account-with-superset-permissions,
which is the source of half the data-leak incidents I have seen
with internal agents.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Discoverability over hand-wired configuration.&lt;/strong&gt; The agent
discovers what tools and resources are available rather than
having them baked into a config file. New capability becomes
available without re-deploying the agent. Removed capability
becomes unavailable without manual cleanup.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Standardised audit metadata.&lt;/strong&gt; Every MCP call has a request ID,
a timestamp, a tool name, a parameter set, a return value. Same
structure regardless of which server you are calling. The
&lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;audit framework I wrote about earlier&lt;/a&gt;
becomes much easier to implement uniformly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transport portability.&lt;/strong&gt; Stdio for local servers, HTTP+SSE for
remote servers, WebSocket-style streaming where needed. The agent
code doesn&#39;t care which.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Where adoption actually is&lt;/h2&gt;
&lt;p&gt;The disconnect between MCP&#39;s design quality and enterprise adoption
is real and worth being specific about.&lt;/p&gt;
&lt;p&gt;In &lt;strong&gt;consumer tooling&lt;/strong&gt;, MCP has effectively won. Claude Desktop,
Cursor, Claude Code, Continue, Windsurf, and a growing set of other
editors all speak MCP natively. The ecosystem of public MCP servers
is in the hundreds and growing: GitHub, GitLab, Linear, Notion,
Slack, Sentry, every major SaaS vendor either has an official
server or a community one.&lt;/p&gt;
&lt;p&gt;In &lt;strong&gt;AI-first companies and developer-tooling startups&lt;/strong&gt;, MCP is
the default. The pattern is: build the product, expose an MCP
server, let the customer&#39;s agent connect.&lt;/p&gt;
&lt;p&gt;In &lt;strong&gt;regulated enterprise&lt;/strong&gt;, the picture is different. The architecture
functions I have talked to in financial services, healthcare,
utilities, and government are in some combination of these states:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Aware of MCP but treating it as a &amp;quot;consumer thing&amp;quot;.&lt;/li&gt;
&lt;li&gt;Concerned about the security model and waiting for &amp;quot;enterprise MCP&amp;quot;.&lt;/li&gt;
&lt;li&gt;Running pilots but not in production.&lt;/li&gt;
&lt;li&gt;Building bespoke wrappers that solve the same problem badly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The mismatch is striking. The protocol is more rigorous than what
most companies have built internally. The security model is better
than service-account-everything. The audit story is cleaner. And
yet the adoption curve is shallow in exactly the segment that would
benefit most.&lt;/p&gt;
&lt;h2&gt;Why enterprise hasn&#39;t moved&lt;/h2&gt;
&lt;p&gt;The reasons I have actually heard, ranked roughly by how often I
hear them:&lt;/p&gt;
&lt;h3&gt;&amp;quot;It feels too new&amp;quot;&lt;/h3&gt;
&lt;p&gt;The honest one. MCP is eighteen months old. Enterprise procurement
cycles do not move at eighteen-month tempo. The CISO will not sign
off on a protocol that doesn&#39;t yet have multiple cycles of
production deployment in regulated industries behind it.&lt;/p&gt;
&lt;p&gt;This is a reasonable concern but it has gotten weaker over time.
By mid-2026, MCP has been deployed by enterprises including several
large banks, healthcare providers, and at least one defence
contractor I have seen reference for. The &amp;quot;too new&amp;quot; objection is
becoming &amp;quot;too new for our particular sector, where we want to see
two more reference customers first&amp;quot;. That last layer of friction
peels off through 2026 and 2027.&lt;/p&gt;
&lt;h3&gt;&amp;quot;It feels like vendor lock-in&amp;quot;&lt;/h3&gt;
&lt;p&gt;This one is wrong but understandable. The intuition is that MCP is
Anthropic&#39;s protocol and adopting it ties you to Anthropic. The
reality is the opposite: MCP is the thing that breaks the vendor
lock-in that bespoke tool wrappers were creating.&lt;/p&gt;
&lt;p&gt;If your agent is built against Anthropic&#39;s function-calling API
specifically, you have an Anthropic-shaped integration that needs
rewriting if you ever move to a different model. If your agent
is built against MCP, the same MCP server works with any MCP-capable
client, regardless of which model vendor is behind it. Adopting
MCP is the un-locking move, not the locking move.&lt;/p&gt;
&lt;p&gt;The protocol itself is open. The spec is on GitHub under an
MIT-style licence. Anthropic, OpenAI, Google, and the major IDE
vendors all participate in the spec. It is not Anthropic&#39;s
proprietary thing.&lt;/p&gt;
&lt;h3&gt;&amp;quot;The security model is not enterprise-grade&amp;quot;&lt;/h3&gt;
&lt;p&gt;This is partially right and getting less right over time.&lt;/p&gt;
&lt;p&gt;The MCP spec as initially shipped had some gaps from an
enterprise-security perspective. Specifically, the auth flow was
under-specified, the discovery layer didn&#39;t have a great answer
for &amp;quot;how does the client know which servers are approved&amp;quot;, and
the per-tool permissions story was thin.&lt;/p&gt;
&lt;p&gt;By mid-2026 most of these gaps have been addressed in the protocol
itself or in canonical patterns around it. OAuth 2.1 with PKCE is
now the standard auth flow for remote MCP servers. Server registries
let the enterprise control which servers a given client is allowed
to discover. Per-tool permission scoping is a standard pattern.
The remaining concerns are real but smaller, and most of them are
also concerns with the bespoke alternatives — they are just more
visible with MCP because the protocol is explicit about what is
happening.&lt;/p&gt;
&lt;h3&gt;&amp;quot;We don&#39;t trust the auth flow&amp;quot;&lt;/h3&gt;
&lt;p&gt;Specific version of the above. The worry is that a malicious or
compromised MCP server could exfiltrate data by tricking the model
into calling it.&lt;/p&gt;
&lt;p&gt;This is a real risk class but the answer is the same as for any
tool-calling architecture: server allowlist at the client level
(the agent will only talk to MCP servers on its approved list),
content security policies on tool returns (the model cannot exfil
data through a tool that does not have a permitted destination),
and prompt-injection mitigation at the model layer (which is an
unsolved problem for tool calling generally, not specifically for
MCP).&lt;/p&gt;
&lt;p&gt;The protocol does not solve prompt injection. Nothing does, yet.
But MCP does not make it worse than the bespoke alternatives, and
in some specific ways it makes it better — the structured nature
of MCP calls gives you more places to enforce policy than free-form
agent-API calls do.&lt;/p&gt;
&lt;h3&gt;&amp;quot;We can build it ourselves&amp;quot;&lt;/h3&gt;
&lt;p&gt;The most expensive of the objections. The architecture function or
the AI platform team has built a bespoke internal protocol that
looks a lot like MCP but predates it, or that the team finds easier
to reason about, or that integrates better with the company&#39;s
existing identity infrastructure.&lt;/p&gt;
&lt;p&gt;The cost of this is the same as any custom-protocol cost: every
new tool integration, every new model client, every new vendor
relationship goes through the custom layer. The team is now
maintaining a small specification, a client library, a server SDK,
and the integrations between them. None of that produces business
value. All of it is reinvention.&lt;/p&gt;
&lt;p&gt;The right move for a team in this position is usually a graceful
migration: keep the existing protocol for the existing integrations,
adopt MCP for new ones, deprecate the custom layer over a year.
The cost of running two protocols for a year is much smaller than
the cost of running a custom protocol forever.&lt;/p&gt;
&lt;h2&gt;Three integration patterns for regulated enterprises&lt;/h2&gt;
&lt;p&gt;The framework for adopting MCP at an enterprise scale comes down
to three patterns, depending on what is being integrated. These
are the patterns I would use if I were running this build today.&lt;/p&gt;
&lt;h3&gt;Pattern 1: First-party MCP servers for internal systems&lt;/h3&gt;
&lt;p&gt;For internal systems that the architecture function controls or
has influence over — the EA platform, the application portfolio,
the capability model, the PMO data, internal documentation — build
a first-party MCP server that exposes those systems through the
protocol.&lt;/p&gt;
&lt;p&gt;Why first-party: the team that knows the data model is the right
team to define what gets exposed. The permissions model can be
faithful to the source system&#39;s permissions. The tool definitions
can be precise rather than approximate. Audit logging can be
co-located with the system itself.&lt;/p&gt;
&lt;p&gt;What this looks like in practice: an MCP server that runs alongside
your internal application, exposing a small number of tools (typically
five to twenty per system, not hundreds), each one mapped to a
specific use case. Not every API endpoint becomes an MCP tool —
that produces an unusable surface area for the model. Curate.&lt;/p&gt;
&lt;p&gt;The internal-systems-first pattern is exactly what I would do for
the &lt;a href=&quot;/blog/case-study-meridian/&quot;&gt;Meridian&lt;/a&gt; platform: expose an MCP
server that lets any other internal agent query the application
portfolio, the capability model, and the &lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;CANVAS&lt;/a&gt;
workflow records, with permissions enforced by the same identity
gateway the application already uses.&lt;/p&gt;
&lt;h3&gt;Pattern 2: Vendor-provided MCP servers for SaaS systems&lt;/h3&gt;
&lt;p&gt;For SaaS systems your company already uses (Salesforce, Atlassian,
Linear, Slack, your data warehouse, your CI system), use the
vendor&#39;s official MCP server if they have one. Most major SaaS
vendors do by mid-2026; the rest will within a year.&lt;/p&gt;
&lt;p&gt;Why vendor-provided: the vendor maintains the integration, including
keeping it in sync with their own API evolution. You inherit their
permissions model, their rate limiting, their authentication. You
do not write integration code.&lt;/p&gt;
&lt;p&gt;What to validate before deploying: the vendor&#39;s MCP server runs in
a region consistent with your data residency requirements, the auth
flow is OAuth 2.1 with PKCE (or stronger), the tool returns can be
constrained to specific scopes, the audit log is exportable to your
own infrastructure. Same diligence as for any third-party SaaS
integration.&lt;/p&gt;
&lt;p&gt;Where the vendor doesn&#39;t have an MCP server: assess whether the
community has built one with quality you can rely on (the GitHub
MCP servers list is the de-facto registry), and if not, either
build your own thin wrapper around the vendor&#39;s API or wait. Do
not adopt low-quality community servers into a regulated production
environment without your own audit pass.&lt;/p&gt;
&lt;h3&gt;Pattern 3: A central MCP gateway&lt;/h3&gt;
&lt;p&gt;For the architecture function&#39;s overall posture, run an internal
MCP gateway that sits between agents and downstream servers. The
gateway provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A registry of approved MCP servers. Agents discover capability
through the gateway, not through ad-hoc server lists.&lt;/li&gt;
&lt;li&gt;Auth proxying. The gateway holds the OAuth tokens for downstream
servers and exchanges them for short-lived credentials per agent
call. The agent never holds long-lived tokens.&lt;/li&gt;
&lt;li&gt;Audit logging. Every call through the gateway is logged centrally,
in the company&#39;s SIEM. This is the implementation point for the
&lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;audit framework&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Policy enforcement. The gateway can block tool calls that violate
policy (e.g., a tool that would exfiltrate PII to an external
destination, a tool from an un-approved server, a call with
parameters outside an allowed range).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This gateway pattern is the missing piece in most enterprise MCP
deployments. It is also the piece that turns MCP from &amp;quot;a protocol
the developers use&amp;quot; into &amp;quot;a governed enterprise capability&amp;quot;. Without
it, every agent makes its own decisions about which servers to
talk to and how. With it, the architecture function has a single
control point.&lt;/p&gt;
&lt;p&gt;The gateway is build-yourself work today. By 2027 there will be
commercial gateway products. By 2028 the gateway will be a standard
piece of the AI platform stack alongside the LLM proxy and the
prompt registry.&lt;/p&gt;
&lt;h2&gt;What to implement first&lt;/h2&gt;
&lt;p&gt;If the architecture function is starting MCP adoption from zero
today, the order of operations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Month one: build one first-party MCP server for an internal
system you control.&lt;/strong&gt; Pick the system with the highest agent-query
volume in your existing setup. Expose it via MCP. Wire up one
agent (Claude Code in the development environment is the easiest
first client). Validate the auth flow, the audit logging, and
the permission delegation. This is the proof-of-life.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Months two and three: deploy the central gateway.&lt;/strong&gt; Build it
yourself if there is nothing on the market that fits your
requirements. Migrate the first-party server from month one
behind the gateway. Migrate one or two vendor MCP servers
(Atlassian, GitHub, your CI system) behind the gateway.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Months four to six: standardise on MCP for new integrations.&lt;/strong&gt;
Any new agent-to-system integration is required to go through
MCP. Architectural exception process for cases where it is not
yet possible. The architectural drift cost of this constraint
is small; the long-term simplification is significant.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Month six onwards: migrate the legacy.&lt;/strong&gt; Deprecate the existing
bespoke integration layer. Pick the top three highest-traffic
custom integrations; rewrite them as MCP servers. Stop maintaining
the old layer.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By the end of the year you have an MCP-first agent platform. The
new integration cost has dropped. The vendor-lock-in surface has
shrunk. The audit posture is uniform across agents and tools. The
architecture function has a clean abstraction to reason about.&lt;/p&gt;
&lt;h2&gt;The honest limitations&lt;/h2&gt;
&lt;p&gt;To be balanced about it: MCP is not a complete answer to every
agent-integration problem.&lt;/p&gt;
&lt;p&gt;The protocol does not solve prompt injection. An MCP-mediated tool
call is as exposed to malicious prompts as a direct API call. The
defences are at the model layer (prompt-injection-resistant system
prompts, output validation, sandboxing of high-risk tools) and at
the gateway layer (policy enforcement on tool calls), not at the
protocol layer.&lt;/p&gt;
&lt;p&gt;The protocol does not specify a great answer for &lt;strong&gt;long-running
operations&lt;/strong&gt;. A tool call that takes minutes (a complex database
query, a large analytics job, an external system that&#39;s slow to
respond) is awkward in MCP today. There are extension patterns
emerging (async tool calls with callbacks, polling endpoints, job
handles) but they are not universal. For workloads that require
long-running operations, you will need an extension or a workaround.&lt;/p&gt;
&lt;p&gt;The protocol does not specify a great answer for &lt;strong&gt;bidirectional
streaming inside a tool call&lt;/strong&gt;. Pure unidirectional streaming
(server to client) is well-supported. Mid-call user prompts, agent
clarifications, or interactive flows inside a tool call are not
fully standard.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;vendor-implementation maturity varies&lt;/strong&gt;. Some MCP servers
are excellent (the major vendor-provided ones tend to be good).
Some are early-stage and not production-quality. The &amp;quot;is this
server production-ready&amp;quot; assessment is not standardised; you have
to do your own diligence.&lt;/p&gt;
&lt;p&gt;These limitations are real. None of them is a reason to ignore
the protocol. All of them are addressable in the integration
pattern you choose.&lt;/p&gt;
&lt;h2&gt;Where this is going&lt;/h2&gt;
&lt;p&gt;A short prediction. By the end of 2027:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP is the default protocol for new agent-to-system integrations
in the enterprise. The number of teams writing bespoke function-calling
wrappers will be small and shrinking.&lt;/li&gt;
&lt;li&gt;A handful of commercial MCP gateway products will exist, with
meaningful market share. The build-yourself gateway becomes a
niche choice rather than the only choice.&lt;/li&gt;
&lt;li&gt;Most major SaaS vendors will have an official MCP server. The
ones that don&#39;t will be at a competitive disadvantage in AI-augmented
workflows.&lt;/li&gt;
&lt;li&gt;The protocol itself will have stabilised. The 1.x to 2.x transition
will have happened with reasonable backward compatibility. The
specification will look more like LSP does today — boring, mature,
reliable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The companies that adopt MCP in 2026 will benefit from being
ahead of the curve when the gateway market matures and the SaaS
ecosystem fills out. The companies that wait until 2028 will be
re-architecting then.&lt;/p&gt;
&lt;p&gt;If you are running an agent program in a regulated company right
now and you do not have an MCP strategy, you have a strategy gap
that is going to cost you. The pieces are there to adopt today.
The adoption cost is low and falling. The integration surface
area you are currently building bespoke is going to be the thing
you regret most when MCP becomes the default. Move now.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Cursor in a regulated industry: the actual policy you need</title>
    <link href="https://tarun.bulchandanis.com/blog/cursor-regulated-policy/"/>
    <id>https://tarun.bulchandanis.com/blog/cursor-regulated-policy/</id>
    <updated>2026-05-28T00:00:00.000Z</updated>
    <published>2026-05-28T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Most regulated companies have either banned AI coding tools entirely or quietly rubber-stamped them. Neither is right. Here is the policy I would write today, with the specific clauses, the vendor-specific configurations, and the audit trail it requires.</summary>
    <content type="html">&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;There are now four serious AI coding tools in widespread enterprise
use: Cursor, Claude Code, GitHub Copilot, and Windsurf. Every
regulated company I have talked to is in one of two states with
respect to them. Either they have officially banned them (developers
use them anyway, on personal devices, with company code) or they
have rubber-stamped them under an &amp;quot;AI policy&amp;quot; written by someone
who has not actually used the tools. Both are worse than doing
nothing, because both produce the illusion of governance without
the substance.&lt;/p&gt;
&lt;p&gt;This piece is the policy I would actually write today for a
regulated enterprise — a financial services firm, a healthcare
provider, a utility, a defence contractor. Six policy areas, each
one specific enough to enforce: data residency, prompt and code
logging, code-review attribution, intellectual property, secrets
handling, and third-party dependency exposure. For each area, the
question, the wrong answer, the right answer, and the enforcement
mechanism. Plus vendor-specific configurations for the four major
products and a note on what architecture owns versus what security
owns.&lt;/p&gt;
&lt;p&gt;If you are about to publish your company&#39;s &amp;quot;AI coding policy&amp;quot; this
quarter, read this first.&lt;/p&gt;
&lt;h2&gt;Why &amp;quot;ban it&amp;quot; and &amp;quot;approve it&amp;quot; both fail&lt;/h2&gt;
&lt;p&gt;The ban fails because developers use the tools anyway. They use them
on personal laptops, on personal accounts, with snippets of company
code pasted across. The code that gets pasted out is often the code
that is most in need of help — the stuck-debugging code, the
authentication module that has gone sideways, the SQL query the team
cannot get to perform. Exactly the wrong code to leak. The ban gives
the company plausible deniability while leaving every actual security
concern in place.&lt;/p&gt;
&lt;p&gt;The rubber-stamp fails because it produces a policy that nobody can
actually comply with, because the people writing it have not used
the tools. The policy will say things like &amp;quot;developers must not paste
sensitive code into AI tools&amp;quot; — which is technically true and
operationally meaningless, because the whole point of the tool is to
read your code. It will say &amp;quot;AI-generated code must be reviewed for
quality&amp;quot; — true of all code; how does AI change the review standard?
It will say &amp;quot;do not use AI tools for production code&amp;quot; — fine, but
then what is the boundary between non-production and production in
a continuous-delivery shop where every commit might end up in
production within hours?&lt;/p&gt;
&lt;p&gt;The result, in both cases, is that the actual governance question
goes unanswered. The actual governance question is: &lt;strong&gt;what is the
specific set of conditions under which an AI coding tool can read,
write, or influence company code, in a way that is auditable, that
respects regulatory and contractual obligations, and that does not
require asking individual developers to make impossible judgment
calls?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That is what a policy needs to answer. Here are the six clauses.&lt;/p&gt;
&lt;h2&gt;Clause 1: data residency&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The question.&lt;/strong&gt; Where does the model that is reading or writing
my code physically run? Where does the data my code touches get
sent during inference? Whose jurisdiction has access to it under
what legal process?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The wrong answer.&lt;/strong&gt; &amp;quot;The AI vendor says they are GDPR-compliant.&amp;quot;
This is necessary but nowhere near sufficient. GDPR-compliance is
about handling personal data correctly. It does not say anything
about where the data is processed, which is the part that matters
under FCA outsourcing rules, under the EU AI Act&#39;s high-risk
provisions, under DORA&#39;s third-party risk requirements, and under
sector-specific rules for healthcare, defence, and critical
infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right answer.&lt;/strong&gt; A specific residency requirement, stated as a
hard constraint. For an EU-headquartered company in financial
services, that looks like: &amp;quot;model inference must occur in an EU
member state. The model provider must be able to provide written
attestation of the inference region for any given request, on
demand. If the inference is routed through a cloud hyperscaler,
the cloud region must be EU. If the underlying inference cannot
be confirmed to occur within the EU, the tool is not approved for
use against company code.&amp;quot;&lt;/p&gt;
&lt;p&gt;For a US-headquartered company in healthcare with HIPAA exposure,
the constraint is different but the structure is the same: a hard
region constraint with an attestation requirement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The enforcement mechanism.&lt;/strong&gt; Procurement contracts include the
specific region clause. The architecture function reviews the
vendor&#39;s inference architecture annually. The security team has a
quarterly check that the configuration in use matches the contract.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical complication.&lt;/strong&gt; Some of the tools (specifically Claude
Code and Copilot Enterprise) can route through a customer-owned
cloud endpoint — Amazon Bedrock, Azure OpenAI, Google Vertex AI.
This is the enterprise-grade answer. The tool runs against an
inference endpoint inside your own cloud account, in a region you
control, with logging you own. If the tool supports this and you
are in a regulated industry, this is the configuration to use. Do
not use the vendor&#39;s default consumer endpoint.&lt;/p&gt;
&lt;h2&gt;Clause 2: prompt and code logging&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The question.&lt;/strong&gt; What record exists, after the fact, of what was
sent to the AI tool and what was returned? Where is that record
stored? Who can access it?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The wrong answer.&lt;/strong&gt; Either &amp;quot;we don&#39;t log it, to protect developer
privacy&amp;quot; or &amp;quot;the vendor logs it for 30 days, we trust them&amp;quot;.
Neither survives a serious audit. The first produces no evidence
at all. The second produces evidence that lives outside your
control, with a retention schedule the vendor sets, in a system
you cannot query.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right answer.&lt;/strong&gt; Customer-owned logging of every AI tool
interaction, stored in the company&#39;s own SIEM or equivalent log
infrastructure, with retention matching the company&#39;s broader log
retention policy (typically two to seven years in regulated
industries). The log captures: the prompt, the response, the model
identifier, the timestamp, the user identity, the project context,
the IDE session ID. Same audit-grade discipline as for any other
sensitive system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The enforcement mechanism.&lt;/strong&gt; Tool configuration that routes
logging to the company&#39;s infrastructure, not the vendor&#39;s.
Periodic audit that the configuration is in place. Sample queries
against the log to confirm prompts are actually being captured.
Disconnect from the tool any developer whose IDE session is not
logging properly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical complication.&lt;/strong&gt; Not all tools support customer-owned
logging at the granularity you need. Cursor&#39;s enterprise tier
supports prompt-level logging to a customer-owned destination.
Claude Code can be configured to log through the Anthropic API
with customer-managed logging if you proxy through your own
infrastructure. Copilot Enterprise supports audit log export.
Windsurf is more limited. The tool selection question and the
logging question are linked: pick a tool that supports the
logging discipline your audit requires.&lt;/p&gt;
&lt;p&gt;This is essentially the same instrumentation discipline I described
in &lt;a href=&quot;/blog/auditing-agent-decisions/&quot;&gt;how to audit a decision an agent made&lt;/a&gt;,
applied to a developer-facing tool.&lt;/p&gt;
&lt;h2&gt;Clause 3: code-review attribution&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The question.&lt;/strong&gt; When AI generates or substantially modifies a
piece of code that lands in your repository, who is recorded as
the author, and how is that disclosed in code review?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The wrong answer.&lt;/strong&gt; Either &amp;quot;treat AI-generated code as if a
human wrote it&amp;quot; or &amp;quot;require a separate label on every line of
AI-touched code&amp;quot;.&lt;/p&gt;
&lt;p&gt;The first is the path most teams default to. It produces
unaccountable code. A bug six months later cannot be traced back to
the prompt that produced it. A regulator asking &amp;quot;who wrote this&amp;quot;
gets a developer name and no record of the AI involvement.&lt;/p&gt;
&lt;p&gt;The second is technically pure and practically unworkable. Modern
AI coding tools produce code in a continuous loop with the
developer. Pretending you can label every AI-influenced character
is silly. The label becomes either pervasive (every file is labelled
AI-touched, which conveys no information) or selective (the
developer decides what to label, which is exactly the judgment call
the policy was meant to remove).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right answer.&lt;/strong&gt; Two separate records, kept distinct.&lt;/p&gt;
&lt;p&gt;First, a &lt;strong&gt;commit-level co-author attribution&lt;/strong&gt;: every commit that
incorporates significant AI assistance is marked with a &lt;code&gt;Co-Authored-By&lt;/code&gt;
trailer naming the AI tool and model version. This is the lightweight,
git-native disclosure. It does not claim to label every line; it
claims to label the commit as one where AI was substantially
involved. The threshold for &amp;quot;substantial&amp;quot; is a team norm, not a
policy clause — typically, &amp;quot;more than a single autocomplete suggestion&amp;quot;.&lt;/p&gt;
&lt;p&gt;Second, an &lt;strong&gt;out-of-band session log&lt;/strong&gt;: the prompt-and-response log
from Clause 2 captures the full record. The git commit links back
to the relevant session via a session ID in the commit message. The
git history shows what was committed; the session log shows how it
got there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The enforcement mechanism.&lt;/strong&gt; Pre-commit hook that prompts the
developer if AI assistance was used and adds the &lt;code&gt;Co-Authored-By&lt;/code&gt;
trailer if so. CI check that any commit marked as AI-assisted has
a corresponding session ID. Code review checklist item: &amp;quot;is the
session log linked, if AI was used&amp;quot;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical complication.&lt;/strong&gt; Developers will not voluntarily mark
every AI-assisted commit. The pre-commit hook can default-on if
the IDE indicates AI activity. Cursor, Claude Code, and Copilot
all expose enough telemetry to the local environment that this is
detectable. Pure mandatory self-disclosure does not work; auto-detection
with a manual override does.&lt;/p&gt;
&lt;h2&gt;Clause 4: intellectual property&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The question.&lt;/strong&gt; Who owns the code that an AI tool produces? Under
what licence? With what indemnity if the output reproduces something
covered by a third-party copyright?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The wrong answer.&lt;/strong&gt; &amp;quot;We accept the AI vendor&#39;s standard terms.&amp;quot;
The standard terms vary widely between vendors and most of them
shift more risk to the customer than a careful read would suggest.
Some vendors offer indemnity for output reproducing copyrighted
training material; some don&#39;t. Some retain the right to train on
your code; some don&#39;t. Some grant a perpetual licence to all code
produced through their tool; some don&#39;t.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right answer.&lt;/strong&gt; A negotiated enterprise contract that explicitly
covers four things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;IP ownership of outputs.&lt;/strong&gt; Customer owns all code produced
through the tool against customer code. No exceptions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No training on customer code.&lt;/strong&gt; Vendor agrees not to use
customer code (prompts or outputs) to train future models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Indemnity for output infringement.&lt;/strong&gt; If the tool&#39;s output
reproduces copyrighted material that the customer subsequently
ships and faces a claim on, the vendor indemnifies. Most major
vendors now offer this (Copilot, Cursor, Claude Code, all have
some form of it on enterprise tiers). Read the actual cap; the
indemnity is often dollar-limited.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data-handling terms that match the company&#39;s standard data
processing agreement.&lt;/strong&gt; If the vendor cannot meet the company&#39;s
standard DPA, that is itself a signal.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;The enforcement mechanism.&lt;/strong&gt; Legal review of the contract before
deployment. Annual recertification. If the vendor terms change
materially, re-review.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical complication.&lt;/strong&gt; Developers using personal accounts
against company code is the IP-leakage attack vector that the
policy needs to close. Even with the right contract for enterprise
licences, an individual developer using a free tier with company
code is operating outside the negotiated terms. Tool access has to
be SSO-enforced and personal accounts have to be blocked at the
network layer or the device-management layer.&lt;/p&gt;
&lt;h2&gt;Clause 5: secrets handling&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The question.&lt;/strong&gt; What stops a developer from accidentally pasting
a production API key, a database password, or a customer record
into an AI tool?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The wrong answer.&lt;/strong&gt; &amp;quot;We tell developers not to do this.&amp;quot; The
training-and-awareness approach has a well-documented track record
of not working. Developers paste secrets into Stack Overflow. They
paste secrets into bug-tracker tickets. They paste secrets into AI
tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right answer.&lt;/strong&gt; A pre-flight scrubbing layer that intercepts
prompts before they leave the developer&#39;s machine. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A local-machine prompt-scanning hook integrated with the IDE.
Scans for high-entropy strings, known credential formats (AWS
keys, Azure connection strings, JWT tokens, OpenAI keys, etc.),
PII patterns (NHS numbers, NI numbers, credit card numbers).&lt;/li&gt;
&lt;li&gt;If a secret is detected, the prompt is blocked from being sent.
The developer sees a warning explaining what was caught.&lt;/li&gt;
&lt;li&gt;The blocked event is logged. The same audit framework as for
successful prompts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not a perfect defence — context-dependent secrets (an
internal hostname, a customer&#39;s company name) are not scannable —
but it eliminates the catastrophic-and-common case of an actual API
key going to a vendor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The enforcement mechanism.&lt;/strong&gt; Mandatory installation of the
scrubbing hook on every developer machine running an approved AI
tool. Periodic check that it is running. Alerts on bypass attempts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical complication.&lt;/strong&gt; The scrubbing layer adds latency to
every prompt. Developers will work around it if the latency is bad.
Tune for sub-100ms scrubbing time. Most current scanners can hit
this if the regex library is reasonable.&lt;/p&gt;
&lt;h2&gt;Clause 6: third-party dependency exposure&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The question.&lt;/strong&gt; When the AI tool suggests using a third-party
library, what stops it from suggesting a malicious or vulnerable
one?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The wrong answer.&lt;/strong&gt; &amp;quot;The developer will check.&amp;quot; Developers do not
check. Developers check less when the suggestion looks confident.
A library suggestion that comes wrapped in a fluent explanation of
why it is the right choice gets less scrutiny than a library
suggestion they found themselves.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The right answer.&lt;/strong&gt; The same software supply chain controls that
should already exist, with the AI tool&#39;s suggestions explicitly in
scope:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An allowlist or denylist of permitted package sources. The AI tool
cannot suggest a library not on the allowlist; if it does, the
suggestion is blocked at IDE level.&lt;/li&gt;
&lt;li&gt;A vulnerability scanner that runs on every dependency added,
whether suggested by AI or by a human. CVE thresholds match the
company&#39;s broader vulnerability policy.&lt;/li&gt;
&lt;li&gt;A typosquatting check: a library name that is very close to but
not exactly a popular package name is flagged. This is the attack
vector where AI tools have been most often documented producing
vulnerable suggestions.&lt;/li&gt;
&lt;li&gt;A &amp;quot;hallucinated package&amp;quot; check: if the AI suggests a library that
does not exist in the company&#39;s package registry mirror, the
suggestion is blocked. Hallucinated packages have been an emerging
vector for supply-chain attacks specifically because they pre-create
the demand that an attacker can then satisfy with a malicious
package of the same name.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The enforcement mechanism.&lt;/strong&gt; The package allowlist is maintained
by the security team and consumed by the IDE plugin. The scanner is
part of the CI pipeline. The typosquatting check is in the IDE
plugin.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical complication.&lt;/strong&gt; New legitimate libraries are added to
ecosystems daily. The allowlist needs an expedited approval path
or it will be ignored. Plan for that.&lt;/p&gt;
&lt;h2&gt;Vendor-specific configurations&lt;/h2&gt;
&lt;p&gt;The six clauses above are tool-agnostic. The configurations to
implement them vary. A quick reference for the four major tools as
of mid-2026:&lt;/p&gt;
&lt;h3&gt;Cursor&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Clause&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Residency&lt;/td&gt;
&lt;td&gt;Cursor Business / Enterprise plans support routing inference to customer-owned LLM endpoints (Bedrock, Azure OpenAI, Vertex AI). Use that. The default consumer endpoint routes through the vendor&#39;s infrastructure with less control.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Enterprise tier supports logging to a customer-owned destination. The logs include the prompt, the response, the file context, and the user identity. Confirm during contract that this can be exported to your SIEM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP terms&lt;/td&gt;
&lt;td&gt;Enterprise contract includes indemnity for output infringement and no-training-on-customer-data terms. Free tier does not. Block free tier at the SSO layer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSO&lt;/td&gt;
&lt;td&gt;Cursor supports SCIM provisioning and SAML SSO on enterprise. Required.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Claude Code&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Clause&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Residency&lt;/td&gt;
&lt;td&gt;Claude Code can run against the Anthropic API directly or through Amazon Bedrock or Google Vertex AI. The Bedrock and Vertex options give you the customer-owned inference region. Use those for regulated workloads.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Anthropic offers enterprise audit logging. Bedrock and Vertex give you CloudTrail-equivalent logging. Both are workable; the Bedrock/Vertex path is more aligned with existing enterprise log discipline.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP terms&lt;/td&gt;
&lt;td&gt;Anthropic enterprise contract offers output indemnity and no-training commitment. Read the indemnity cap; it is non-trivial but bounded.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSO&lt;/td&gt;
&lt;td&gt;Required on enterprise tier.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;GitHub Copilot Enterprise&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Clause&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Residency&lt;/td&gt;
&lt;td&gt;Microsoft routes Copilot inference through Azure OpenAI infrastructure. The customer can request a specific region. EU customers should specify an EU region in the contract.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Copilot Enterprise has audit log export. The grain is per-suggestion rather than per-prompt; the model context the suggestion was based on is captured. Sufficient for most audit purposes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP terms&lt;/td&gt;
&lt;td&gt;Microsoft indemnity is the most generous in the market and the most extensively litigated. Read the carve-outs (notably for &amp;quot;duplicate detection turned off&amp;quot;). Leave duplicate detection on.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSO&lt;/td&gt;
&lt;td&gt;Enterprise tier requires GitHub Enterprise. Required.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Windsurf&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Clause&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Residency&lt;/td&gt;
&lt;td&gt;Less mature in this area as of mid-2026. Limited customer-owned-endpoint options. For regulated workloads, treat as conditional approval at best.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Limited enterprise logging options.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP terms&lt;/td&gt;
&lt;td&gt;Newer to enterprise contracting; terms are less standardised.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSO&lt;/td&gt;
&lt;td&gt;Available on enterprise tier.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The pattern: for serious regulated workloads, Cursor (with a
customer-owned LLM endpoint), Claude Code (via Bedrock or Vertex),
or Copilot Enterprise are the workable choices. Windsurf is fine
for non-regulated workloads but does not yet have the enterprise
controls the other three have.&lt;/p&gt;
&lt;h2&gt;What architecture owns vs what security owns&lt;/h2&gt;
&lt;p&gt;A short note on the politics of this, because every regulated company
I have worked with has the same conversation about who owns the
policy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Security owns&lt;/strong&gt; the enforcement layer: the SSO configuration, the
network blocks on personal accounts, the prompt-scrubbing hooks, the
vulnerability scanners, the SIEM integration. They are the
operational owner of &amp;quot;is the policy actually being followed&amp;quot;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture owns&lt;/strong&gt; the policy itself: what the clauses are, what
tools are approved, what configurations are required, what trade-offs
are acceptable. They are the technical authority on &amp;quot;what should
the policy say&amp;quot;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Legal and procurement own&lt;/strong&gt; the contract: the IP terms, the
indemnity, the data-handling clauses, the residency commitments.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Chief Risk Officer or equivalent owns&lt;/strong&gt; the residual risk
acceptance: signs off that the policy as written is consistent with
the company&#39;s risk appetite.&lt;/p&gt;
&lt;p&gt;When this is unclear, the policy drifts. Security writes a policy
that is too restrictive because they cannot model the development
workflow. Architecture writes a policy that is too permissive
because they cannot model the residual risk. Legal writes a policy
that is unimplementable because they cannot model the tool&#39;s
actual capability. The four functions need to be in the same room
when the policy is written.&lt;/p&gt;
&lt;h2&gt;A transition plan&lt;/h2&gt;
&lt;p&gt;If your company is currently in one of the two failure states (ban
or rubber-stamp), here is how to get to a working policy in roughly
ninety days.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Days 1–14: Inventory the actual usage.&lt;/strong&gt;
Survey developers anonymously about which AI tools they are using,
on which devices, with what data. Most companies are shocked by the
results. The right baseline is honesty about what is already
happening, not what the policy nominally allows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Days 14–30: Draft the six clauses against your context.&lt;/strong&gt;
Use the framework above as a starting point. Specifics vary by
industry, by jurisdiction, by sensitivity of code base. Convene
architecture, security, legal, procurement, and risk in the same
room. Write the policy in language your developers can understand.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Days 30–60: Negotiate the enterprise contract for one tool.&lt;/strong&gt;
Pick one tool to standardise on first. Multiple tools is fine
later; one is enough to begin. Negotiate the enterprise contract
to match the clauses. Be willing to walk away from a vendor that
will not meet the data residency or logging requirements.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Days 60–75: Deploy the enforcement layer.&lt;/strong&gt;
SSO configuration. Network blocks on free tiers. Pre-commit hooks.
Prompt scrubbing. SIEM integration. The plumbing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Days 75–90: Roll out, observe, iterate.&lt;/strong&gt;
Phased deployment to one engineering team, then the rest. The first
team is the canary; observe what breaks. The policy will need
adjustment in the first month. Plan for that.&lt;/p&gt;
&lt;p&gt;After ninety days you will have a working policy, an enforcement
layer, an audit trail, and a defensible position when the next
regulatory cycle starts.&lt;/p&gt;
&lt;h2&gt;What this is, and what it is not&lt;/h2&gt;
&lt;p&gt;This is a policy framework. It is not a security strategy. It is
not an AI strategy. It is a specific governance layer aimed at one
specific question: how does this company use AI coding tools in a
way that is auditable, contractually sound, and consistent with the
regulatory environment.&lt;/p&gt;
&lt;p&gt;It assumes a regulated context. If you are running a B2B SaaS with
no regulated customer base, several of these clauses are overkill.
If you are running a defence contractor or a systemically important
financial institution, several of these clauses are not strict
enough. Calibrate to context.&lt;/p&gt;
&lt;p&gt;It also assumes the tools will get better. They will. The policy
needs to be revisable. Quarterly review of the approved-tools list,
annual review of the contract terms, periodic spot-checks on the
enforcement layer. Treat it as a living document.&lt;/p&gt;
&lt;p&gt;The point is not to slow down AI adoption. The point is the
opposite: a working policy lets a company adopt AI coding tools at
scale with the regulatory exposure controlled. Companies without a
working policy slow down anyway, because every team makes the
governance decision individually and badly. A working policy
removes the ambiguity and lets the development organisation actually
get on with it.&lt;/p&gt;
&lt;p&gt;If you are about to publish your &amp;quot;AI policy&amp;quot; this quarter and have
not yet written the six clauses with the specificity above, push
the publish date.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>How do you audit a decision an agent made? A working framework</title>
    <link href="https://tarun.bulchandanis.com/blog/auditing-agent-decisions/"/>
    <id>https://tarun.bulchandanis.com/blog/auditing-agent-decisions/</id>
    <updated>2026-05-21T00:00:00.000Z</updated>
    <published>2026-05-21T00:00:00.000Z</published>
    <author>
      <name>Tarun Bulchandani</name>
    </author>
    <summary>Most AI governance frameworks operate at the level of policies and intent. They don&#39;t survive contact with an actual regulator. Here is a concrete, code-level pattern for making agentic systems auditable in production, in regulated industries, with examples.</summary>
    <content type="html">&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;The single hardest unsolved problem with deploying AI agents into
regulated enterprises is not capability, latency, hallucination, or
cost. It is auditability. When General Counsel, the Chief Compliance
Officer, or a regulator asks &amp;quot;show me, in full, what this system
told my employee on March 12 at 14:32, what data it looked at when
it produced that answer, and what action was taken as a result&amp;quot;,
most agentic systems in production today cannot answer the question.
This is a design failure, not an inevitable one.&lt;/p&gt;
&lt;p&gt;The framework that follows treats audit as four distinct layers
that must each be captured separately and verifiably: the &lt;strong&gt;request&lt;/strong&gt;
(what was asked), the &lt;strong&gt;context&lt;/strong&gt; (what the model was given), the
&lt;strong&gt;generation&lt;/strong&gt; (what the model produced), and the &lt;strong&gt;action&lt;/strong&gt; (what
the system did with the output). Each layer has a specific data
model, a specific storage discipline, and a specific failure mode.
This is the pattern I built into &lt;a href=&quot;/blog/case-study-meridian/&quot;&gt;Meridian&lt;/a&gt;
and into &lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;CANVAS&lt;/a&gt;, and it is the pattern
I would carry into any agent deployment in a regulated environment.&lt;/p&gt;
&lt;p&gt;If you are running an agent in production right now and any of the
four layers is missing, your audit story does not actually work.
This piece walks through the implementation.&lt;/p&gt;
&lt;h2&gt;Why most AI governance frameworks don&#39;t survive contact&lt;/h2&gt;
&lt;p&gt;There are now several reasonable-quality AI governance frameworks
in the public domain: the NIST AI RMF, the EU AI Act compliance
guidance, the various sector-specific overlays (FCA&#39;s discussion
papers on AI in financial services, the FDA&#39;s draft guidance on
AI/ML medical devices, the Bank of England&#39;s supervisory statements,
and the equivalents in other jurisdictions). They are useful. They
are also, mostly, written at the level of policies, principles,
and intended outcomes — not at the level of the data structures
and code paths that determine whether the policy is actually
implementable.&lt;/p&gt;
&lt;p&gt;This produces a familiar pattern. The Risk function publishes a
sound-looking AI policy. Architecture nods along. Engineering ships
the agent. Six months later, the first proper audit happens. The
auditor asks for the records that the policy implies should exist.
The records don&#39;t exist, or they exist in five different systems,
or they exist but cannot be linked together because the system
that called the LLM didn&#39;t log the trace ID that the system that
took the action recorded.&lt;/p&gt;
&lt;p&gt;Audit is not a policy problem. Audit is an instrumentation problem.
Instrumentation has to be designed in. Retrofitting it is expensive
and produces a worse result.&lt;/p&gt;
&lt;p&gt;This piece is the instrumentation framework I would build into any
agentic system that needs to survive regulatory scrutiny.&lt;/p&gt;
&lt;h2&gt;The four layers&lt;/h2&gt;
&lt;p&gt;Every agent decision sits on top of four distinct artefacts that
must be captured separately:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;        ┌─────────────┐
        │   REQUEST   │ ← what the user asked, in what context, with what permissions
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │   CONTEXT   │ ← what the model was given to work with (retrieval, tools, system prompt)
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │  GENERATION │ ← what the model produced (raw output, structured parse, confidence)
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │    ACTION   │ ← what the system did as a result (writes, side effects, downstream calls)
        └─────────────┘
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The auditor&#39;s question — &amp;quot;what happened on March 12&amp;quot; — is actually
four sub-questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What was the user trying to do?&lt;/strong&gt; (request layer)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What information did the model see when it decided?&lt;/strong&gt; (context layer)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What did the model actually say?&lt;/strong&gt; (generation layer)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What did the system then do?&lt;/strong&gt; (action layer)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If any of those four cannot be answered with high fidelity and
linked back to the others through a stable identifier, your audit
is broken. The instrumentation discipline is to instrument each
layer separately, capture it deterministically, and tie them
together with a trace ID that propagates end to end.&lt;/p&gt;
&lt;p&gt;The rest of this piece walks through each layer in turn.&lt;/p&gt;
&lt;h2&gt;Layer 1: the request&lt;/h2&gt;
&lt;p&gt;What needs to be captured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Trace ID&lt;/strong&gt;. A UUID generated at the entry point of the request,
propagated through every downstream call. This is the spine of
the whole audit record. Without it, you can capture every layer
perfectly and still not be able to link them.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actor identity&lt;/strong&gt;. The authenticated user, including the
identity-provider claims that were validated at the gateway. Not
just &amp;quot;user X&amp;quot; but &amp;quot;user X, authenticated via OIDC against IdP Y,
with claims {department: Z, role: W}, at 14:32:07 UTC&amp;quot;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The literal request&lt;/strong&gt;. Whatever the user actually typed, asked,
or submitted. Stored verbatim. Not summarised, not cleaned, not
sanitised. If the user pasted an SSN into the chat by accident,
you want to know that — both because you may need to scrub it
downstream and because the question of &amp;quot;did the system handle a
PII-bearing prompt&amp;quot; is itself an auditable event.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request context that the system used&lt;/strong&gt;. Was this an authenticated
API call, a chat session, a scheduled job? Was the user inside
the company network, or remote? Which tenant, if you are multi-tenant?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Wall-clock timestamp&lt;/strong&gt;. UTC, to the millisecond. Plus the
system&#39;s own monotonic clock if you have one. Wall clocks drift;
monotonic clocks don&#39;t.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Permissions snapshot&lt;/strong&gt;. The set of permissions the user held at
the moment of the request. Not &amp;quot;the user&#39;s current permissions&amp;quot; —
permissions change — but the snapshot that was used to authorise
the call. This is the protection against the &amp;quot;the user used to
have access to that data&amp;quot; defence.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The request layer is the easiest of the four to capture well, and
the most commonly captured badly. The two failure modes I see most
often:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Trace ID is generated downstream&lt;/strong&gt;, not at the gateway. This
means a system-internal failure (the LLM call timing out, a retry,
a fallback path) produces a different trace ID than the original
request. The audit log shows two trace IDs for what is, from the
user&#39;s perspective, one event. Always generate the trace ID at the
outermost entry point and propagate it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The user&#39;s literal input is paraphrased or stripped of metadata
before logging&lt;/strong&gt;. Often done with good intent — to remove PII or
to compress the log. Bad practice. Capture the original; redact
in views, not in storage. Storage redaction is a one-way operation
that destroys evidence.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Layer 2: the context&lt;/h2&gt;
&lt;p&gt;What needs to be captured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The system prompt&lt;/strong&gt;, in full. Not just a reference to &amp;quot;system
prompt v3&amp;quot; — the actual text that was sent to the model. System
prompts change. Prompt-caching layers can in theory be replayed
from a cache key, but in practice you want the full text in the
audit record so the audit doesn&#39;t depend on the cache key still
resolving in three years&#39; time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The retrieved context&lt;/strong&gt;. Whatever the RAG layer pulled in.
Specifically: which documents were retrieved, with what IDs at
what versions, in what order, with what similarity scores, and
what the actual content of each retrieved chunk was. The chunk
content matters because retrieved data can change underneath you
— a document gets updated, a record gets soft-deleted, an embedding
index is rebuilt. The audit record needs the data as the model
saw it, not the data as it exists now.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool definitions&lt;/strong&gt;, if the model was given tools. The schema of
every tool the model could have called. Tools change too. The set
of tools available to the agent on March 12 may not be the set
available today.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conversation history&lt;/strong&gt;, if this was a multi-turn interaction.
Captured turn by turn, with trace IDs linking back to earlier
requests so the full thread can be reconstructed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model identifier&lt;/strong&gt;, including the exact version. &amp;quot;Claude&amp;quot; is
not enough. &amp;quot;claude-opus-4-7&amp;quot; is enough. Model versions change.
Behaviour changes with them. The audit record needs to know which
version made the call.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sampling parameters&lt;/strong&gt;. Temperature, top_p, top_k, max_tokens,
any stop sequences, any structured-output schemas. Determinism
isn&#39;t possible with most LLMs, but the parameters that influence
the distribution of outputs are part of the audit story.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The two failure modes I see most often at this layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The retrieved context is referenced but not stored&lt;/strong&gt;. The audit
log says &amp;quot;retrieved 3 documents, IDs 47, 92, 318&amp;quot; but doesn&#39;t
include the content of those documents at the time of retrieval.
Then the documents change. The audit record is now ambiguous —
you cannot tell whether the model&#39;s response was reasonable
given what it actually saw.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The system prompt is stored as a reference, not as text&lt;/strong&gt;. The
audit log says &amp;quot;system prompt: meridian.v3&amp;quot;, and meridian.v3 is a
pointer to a config file that has since been updated. The audit
is unreplayable. Always inline the system prompt text.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Layer 3: the generation&lt;/h2&gt;
&lt;p&gt;What needs to be captured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The raw model output&lt;/strong&gt;, verbatim. Whatever bytes came back from
the model. No formatting, no cleaning, no post-processing applied
yet.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The structured parse&lt;/strong&gt;, if the system extracted structured data
from the output. Both the parsed structure and any validation
errors that occurred during parsing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool calls made by the model&lt;/strong&gt;, if applicable. Which tools the
model called, with what arguments, in what order, and what each
tool returned. Tool calls produce their own sub-audit records,
linked by the trace ID and a sequence number.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Latency&lt;/strong&gt;. How long the model took. Not because latency is
inherently auditable, but because a model call that took 30 seconds
when it normally takes 3 is a signal that something was unusual
about that particular generation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost&lt;/strong&gt;, if you are tracking it. Input tokens, output tokens,
cache reads, cache writes. The economic record is part of the
audit record because cost is often the first place anomalies
show up.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The failure modes here are mostly about post-processing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The system stores the cleaned output instead of the raw output&lt;/strong&gt;.
Markdown got rendered to HTML before logging. Citation markers
got stripped. The output that an LLM was actually told to produce
is no longer present in the audit record, only the version that
the rendering layer produced. Always log raw first, render later.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool calls are logged as completed actions, not as model
decisions&lt;/strong&gt;. The audit log shows &amp;quot;the system updated record 42&amp;quot;,
not &amp;quot;the model decided to call updateRecord(42) and the tool
succeeded&amp;quot;. For agent audit, the decision is the audit-relevant
event, not just the outcome.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Layer 4: the action&lt;/h2&gt;
&lt;p&gt;What needs to be captured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;What the system did with the output&lt;/strong&gt;. Did it write to a
database? Send an email? Update a workflow stage? Call an external
API? Each of these is an auditable event in its own right and
needs to be captured with the same discipline as the LLM call.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The before-and-after state&lt;/strong&gt;, for any write. The audit_log table
in CANVAS uses a JSONB column for &lt;code&gt;before_state&lt;/code&gt; and another for
&lt;code&gt;after_state&lt;/code&gt;. The diff between them is the auditable change.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The human-in-the-loop record&lt;/strong&gt;, if there was one. Did a person
review and approve the model&#39;s suggested action before it
executed? If yes, capture who, when, and what they were shown.
If no — if the action was fully automated — capture that fact
explicitly. &amp;quot;Auto-executed&amp;quot; is a critical audit datum.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The downstream effects&lt;/strong&gt;, if any. If the action triggered
notifications, scheduled jobs, or further agent calls, those
effects are part of the audit chain. Trace ID continues to
propagate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The action layer is where most agentic systems either accept genuine
auditability or fail to. The failure modes are subtle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Actions are logged in the application database but not linked
to the trace ID&lt;/strong&gt;. The application says &amp;quot;record 42 was updated at
14:32 by automation&amp;quot;. The audit log says &amp;quot;trace ID abc made a
model call at 14:32&amp;quot;. Without a stable link between the two,
you cannot prove which model call caused which database update.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The human-in-the-loop step exists but is not recorded as part
of the agent decision chain&lt;/strong&gt;. There is a separate approval system
that records human sign-offs, but it does not store the trace ID
of the model call that produced the suggestion. So &amp;quot;the human
approved this&amp;quot; exists in one log; &amp;quot;the model suggested this&amp;quot;
exists in another; nothing links them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The append-only audit table&lt;/h2&gt;
&lt;p&gt;The architectural pattern that holds all four layers together is an
append-only audit table. Strictly: never updated, never deleted.
Insert-only privileges on the application database user. Indexed
heavily on &lt;code&gt;trace_id&lt;/code&gt;, &lt;code&gt;actor_id&lt;/code&gt;, &lt;code&gt;occurred_at&lt;/code&gt;, and
&lt;code&gt;entity_id&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A minimum-viable schema (PostgreSQL, but the structure is portable):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;CREATE TABLE audit_log (
  id              UUID PRIMARY KEY,
  trace_id        UUID NOT NULL,
  layer           VARCHAR(20) NOT NULL,    -- REQUEST | CONTEXT | GENERATION | ACTION
  occurred_at     TIMESTAMPTZ NOT NULL,
  actor_id        UUID,                    -- nullable for SYSTEM actions
  actor_type      VARCHAR(20) NOT NULL,    -- USER | SYSTEM | MODEL
  action          VARCHAR(255) NOT NULL,
  entity_type     VARCHAR(100),
  entity_id       UUID,
  payload         JSONB NOT NULL,          -- layer-specific content
  ip_address      INET,
  user_agent      TEXT
);

CREATE INDEX idx_audit_trace ON audit_log (trace_id, occurred_at);
CREATE INDEX idx_audit_actor ON audit_log (actor_id, occurred_at);
CREATE INDEX idx_audit_entity ON audit_log (entity_type, entity_id);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;payload&lt;/code&gt; column is JSONB because each layer has a different
shape. Use a discriminated union in your application code:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Payload schema&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;REQUEST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ request_text, request_context, permissions_snapshot, idp_claims }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONTEXT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ system_prompt, retrieved_chunks: [...], tools: [...], model: &amp;quot;claude-opus-4-7&amp;quot;, parameters: {...} }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GENERATION&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ raw_output, parsed_structure, tool_calls: [...], latency_ms, input_tokens, output_tokens }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACTION&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ action_type, target, before_state, after_state, automated: bool, approved_by_id?, approval_trace_id? }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The discipline is that &lt;strong&gt;every layer for every request produces at
least one audit row&lt;/strong&gt;, and every row carries the trace_id that
threads them together.&lt;/p&gt;
&lt;h2&gt;The eval question, which is also an audit question&lt;/h2&gt;
&lt;p&gt;Evals are usually framed as a quality concern. They are also an
audit concern, and the audit framing changes how you design them.&lt;/p&gt;
&lt;p&gt;The standard eval setup runs a test suite against the model on a
schedule and produces a quality score. The audit framing asks a
different question: when the regulator asks &amp;quot;how do you know your
agent was performing within spec on March 12&amp;quot;, what is your
evidence?&lt;/p&gt;
&lt;p&gt;The answer is the eval log. For every production deployment of a
model, you should have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An eval suite that runs against the model version currently in
production.&lt;/li&gt;
&lt;li&gt;A schedule (typically nightly) that runs the suite and records
results.&lt;/li&gt;
&lt;li&gt;A persistent record of every run, including the eval suite version,
the model version, the prompts, the expected outputs, and the
actual outputs.&lt;/li&gt;
&lt;li&gt;An alert that fires when scores drop below a threshold.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the regulator asks about March 12, the answer is: &amp;quot;on March 12,
the eval suite was at version 1.4.2, the model was claude-opus-4-7,
the suite ran at 02:00 UTC and scored 94.7% against a 90% threshold,
no alerts fired, and the previous seven days of results were
between 93.1% and 95.4%&amp;quot;. That is an audit-grade answer to a
quality question.&lt;/p&gt;
&lt;p&gt;The eval log lives in the same kind of append-only structure as the
production audit log, with cross-references where useful (a sampled
production query can be added to the eval set; the eval set can
reference production failures).&lt;/p&gt;
&lt;h2&gt;What to redact, when, and where&lt;/h2&gt;
&lt;p&gt;A common worry: &amp;quot;if I store the literal user input and the full
model output, am I now sitting on a pile of PII that is itself an
audit liability?&amp;quot;&lt;/p&gt;
&lt;p&gt;Yes. This is real and it has to be designed for.&lt;/p&gt;
&lt;p&gt;The principle: &lt;strong&gt;redact at the view, not at the store&lt;/strong&gt;. The audit
log stores raw. Views over the audit log apply role-based redaction:
the application UI shows the user a summarised version; the internal
operations dashboard shows authorised staff a more complete version;
the regulator-facing export, on request, shows the full record with
appropriate access controls.&lt;/p&gt;
&lt;p&gt;The redaction logic lives in the view layer and is itself auditable.
&amp;quot;User X viewed audit record Y on date Z&amp;quot; is an audit event. The
record of who has accessed sensitive parts of the audit log is itself
audit-grade.&lt;/p&gt;
&lt;p&gt;The reason this matters: if your application logs are themselves
redacted at the point of storage, you cannot un-redact them later
when, for example, a different regulator asks a different question
with a wider remit. View-time redaction preserves the option;
storage-time redaction destroys it.&lt;/p&gt;
&lt;h2&gt;A worked example&lt;/h2&gt;
&lt;p&gt;Imagine an agent that helps an internal user query the application
portfolio. The user asks: &amp;quot;which apps in the Finance domain process
European personal data and have a contract renewal due before
year end?&amp;quot;&lt;/p&gt;
&lt;p&gt;A complete audit record for this single interaction looks like
this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;trace_id: 7f3a8c2e-...

[14:32:07.123] REQUEST
  actor_id: tarun-...
  actor_type: USER
  payload:
    request_text: &amp;quot;which apps in the Finance domain ...&amp;quot;
    permissions_snapshot: [&amp;quot;portfolio:read&amp;quot;, &amp;quot;ai_assistant:use&amp;quot;]
    idp_claims: { tenant: &amp;quot;main&amp;quot;, department: &amp;quot;Architecture&amp;quot; }

[14:32:07.456] CONTEXT
  payload:
    system_prompt: &amp;quot;You are an enterprise architecture assistant ...&amp;quot;
    model: &amp;quot;claude-opus-4-7&amp;quot;
    parameters: { effort: &amp;quot;high&amp;quot;, thinking: { type: &amp;quot;adaptive&amp;quot; } }
    retrieved_chunks:
      - { id: &amp;quot;app-042&amp;quot;, title: &amp;quot;...&amp;quot;, content: &amp;quot;...&amp;quot;, score: 0.92 }
      - { id: &amp;quot;app-119&amp;quot;, title: &amp;quot;...&amp;quot;, content: &amp;quot;...&amp;quot;, score: 0.88 }
      - { id: &amp;quot;app-208&amp;quot;, title: &amp;quot;...&amp;quot;, content: &amp;quot;...&amp;quot;, score: 0.84 }
    tools: [&amp;quot;search_portfolio&amp;quot;, &amp;quot;filter_by_attributes&amp;quot;]

[14:32:11.892] GENERATION
  payload:
    raw_output: &amp;quot;Three apps match your criteria: ...&amp;quot;
    parsed_structure: { matches: [&amp;quot;app-042&amp;quot;, &amp;quot;app-119&amp;quot;, &amp;quot;app-208&amp;quot;], reasoning: &amp;quot;...&amp;quot; }
    latency_ms: 4436
    input_tokens: 1247
    output_tokens: 312
    tool_calls: []

[14:32:12.001] ACTION
  payload:
    action_type: &amp;quot;render_response&amp;quot;
    target: &amp;quot;chat_session_...&amp;quot;
    automated: true
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the audit-grade view of one interaction. When the regulator
asks about it three years later, every layer can be reconstructed.
The system prompt is inlined. The retrieved chunks are inlined. The
raw output is inlined. The action is recorded. The trace ID threads
everything together.&lt;/p&gt;
&lt;p&gt;The cost of this is storage and a small amount of write-time latency.
For a typical enterprise agent, audit log volume is on the order of
single-digit megabytes per day. Cheap relative to the value of
being able to answer the regulator&#39;s question.&lt;/p&gt;
&lt;h2&gt;Failure modes I have seen in production&lt;/h2&gt;
&lt;p&gt;A short list of things that look like they work and don&#39;t:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Audit logs in the application&#39;s main database with the same
privileges as the application user.&lt;/strong&gt; A bug in the application
layer that updates audit rows defeats the whole point. The audit
log table needs &lt;code&gt;INSERT&lt;/code&gt;-only grants. If you can do this with a
separate database role on the same database, that&#39;s fine; better
is a separate write-only log destination (a dedicated event store,
an append-only message log, a write-once-read-many store) that
the application cannot delete from.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A reasonable audit log for the LLM layer but no link to the
database writes the agent caused.&lt;/strong&gt; The model side is fine. The
database side is fine. They are not linked. Always propagate the
trace ID into the database writes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Conversation history stored only on the client side.&lt;/strong&gt; A web
chat that retains conversation in the browser, sends the history
to the model with each turn, but does not store the history
server-side. When the regulator asks &amp;quot;what was the model told&amp;quot;,
the answer is &amp;quot;ask the user, they have it in their browser cache&amp;quot;.
This does not work. Server-side conversation storage is the audit
trail.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Citations in the model&#39;s response that point to documents the
model didn&#39;t actually see.&lt;/strong&gt; This is a hallucination class. It
happens. The defence is in the context layer: every citation in
the output must be verifiable against the retrieved chunks
captured at the context layer. If a citation references a document
not present in the context, that is itself an auditable anomaly
and should be flagged.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tool calls treated as part of &amp;quot;the response&amp;quot; rather than as
separate audit events.&lt;/strong&gt; The model&#39;s tool calls are decisions.
Each one has its own arguments, its own response, its own latency,
its own success/failure status. Treating them as opaque steps
inside the generation collapses the audit chain. Each tool call
needs its own audit row.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The &amp;quot;automated vs human-approved&amp;quot; flag is missing.&lt;/strong&gt; When the
regulator asks &amp;quot;did a human approve this action&amp;quot;, the answer
needs to be recoverable from the audit log alone. Adding the
&lt;code&gt;automated: bool&lt;/code&gt; and &lt;code&gt;approved_by_id&lt;/code&gt; fields to every action row
is cheap and pays for itself the first time anyone asks.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What this gets you, in practice&lt;/h2&gt;
&lt;p&gt;When the framework above is implemented properly, three things
become trivial that were previously hard.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Replayability.&lt;/strong&gt; Any past decision can be reconstructed. You can
re-show, exactly, what the model saw and what it produced. Useful
for debugging, useful for retrospective evals, useful when defending
the system to an auditor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Anomaly detection.&lt;/strong&gt; Anomalies in any of the four layers are
detectable. A spike in retrieval-confidence variance. A latency
outlier. A tool call with unusual arguments. A run of automated
actions without human approval where there usually is one. These
are all queryable on the audit log.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regulatory defensibility.&lt;/strong&gt; When the regulator arrives, the answer
to &amp;quot;show me how this works&amp;quot; is not a slide deck. It is a query
against the audit log that produces an exact, timestamped, sourced
record. The regulator does not need to trust the policy document;
they can read the data.&lt;/p&gt;
&lt;p&gt;This last point is the actual goal. Most AI governance work is
producing assurance through documentation. Audit instrumentation
produces assurance through evidence. Evidence is what gets you
through a real audit; documentation is what gets you through a
desk-side review.&lt;/p&gt;
&lt;h2&gt;Where this fits&lt;/h2&gt;
&lt;p&gt;Two related pieces on this site:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/blog/case-study-meridian/&quot;&gt;Meridian: building the EA platform we couldn&#39;t buy&lt;/a&gt; —
describes the broader context and the conversational assistant
the audit framework was designed around.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/blog/case-study-canvas/&quot;&gt;CANVAS: building the approval workflow no commercial product
covers&lt;/a&gt; — describes the workflow side
of the same system, where the action layer of the audit framework
is wired into the application workflow.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are starting an agent build today and any of the four layers
above is missing from your design, stop and add it before you write
more application code. Retrofitting audit is more expensive than
designing it in. The instrumentation discipline is also the
discipline that makes the system itself better — every layer of
the audit story is also a layer of the system that can be tested,
observed, and improved.&lt;/p&gt;
</content>
  </entry>
</feed>
