AI Governance Is a Deployment Problem

Roger Roman
Jan 6
7 min read

The first time we deployed an AI-assisted intake system into a legally constrained workflow, the model performed exactly as designed.

The problem was the county clerk's inbox.

A petition that should have been routed automatically sat in a shared email account for three days. The formatting of an attachment had triggered a spam filter. No one noticed until a frustrated applicant called to ask why their case hadn't moved. The AI had done its job. The workflow around it had not.

That moment shifted how I think about AI governance. Not purely as a model problem, but as a systems problem. The distinction matters.

Two Problems, Not One

Most public debate about AI governance focuses on frontier systems: capability thresholds, alignment techniques, catastrophic risk. Those questions matter. Misaligned or misused advanced systems could cause harms at a scale that no deployment framework would contain. The researchers working on those problems are not mistaken about their importance.

But frontier risk and deployment risk are not competing priorities. They are interdependent layers of the same governance challenge. A well-aligned model deployed into a brittle institutional environment can still produce systematic harm. And governance frameworks designed only for the frontier layer will not reach the courts, licensing boards, and benefits offices where AI already intersects daily with people's civil rights and economic mobility.

The asymmetry I want to name is not that frontier safety receives too much attention in absolute terms. It is that the deployment layer receives too little attention relative to where most current public-sector AI actually operates. Procurement design, administrative capacity, auditability, and institutional incentive structures all fall into that gap. And that gap has consequences.

What Constrained Deployment Actually Teaches You

When we built AI-assisted tools for expungement intake, the technical challenge was manageable. Classification and structured document generation are not frontier problems. The harder challenge was institutional: designing a system that respected statutory limits, avoided unauthorized practice of law concerns, and preserved clear lines of authority between software guidance and prosecutorial discretion.

The model could suggest eligibility. It could structure petition language. It could surface relevant statutes. It could not replace the legal authority of a clerk, prosecutor, or judge. Any design that blurred that line would undermine both trust and adoption.

Designing that boundary was a governance decision, not a technical one. It required understanding which institutional actors held legitimate authority, what accountability looked like when a workflow produced an error, and how to preserve auditability when automation compressed decision-making time. Those questions rarely surface at the level of operational accountability that deployment actually demands.

The Hard Tradeoff

Abstract governance principles become real at the moment of a design decision. Here is one that still stays with me.

Early in our deployment, we had the technical capacity to automate eligibility screening entirely. The goal was to move an applicant from intake through a preliminary determination without human review at any intermediate step. That would have meaningfully reduced processing time. For applicants with straightforward cases, it would have improved access and reduced the administrative burden on already-stretched county staff.

We didn't do it.

The reason was not technical limitation. It was that automating that determination, even when correct, would have created an accountability gap we couldn't close. When a determination is wrong, someone must own it. In a fully automated flow, that ownership becomes ambiguous. Is it the vendor? The agency that adopted the tool? The clerk who didn't review an output they were never prompted to review? In a legal domain where errors affect civil rights, ambiguous accountability is not a second-order concern. It is the primary risk.

So we introduced a mandatory review checkpoint that added friction and slowed the process. Some applicants waited longer than they would have otherwise. We preserved institutional accountability at the cost of speed.

I believe that was the right call. I also know it was a tradeoff, not a resolution. There are applicants for whom that delay was not abstract. Governance lives in those choices. Anyone who says otherwise is not wrestling with the problem honestly.

Why Risk Frameworks Stall Before They Land

Federal agencies and serious think tanks have produced thoughtful AI risk management frameworks. NIST's AI RMF is rigorous. OMB's guidance on agency AI use is substantive. The intent is sound. The principles are right.

The friction appears when those frameworks meet local institutional capacity. Not because the frameworks are unrealistic in their ambitions, but because they are under-resourced in their implementation. Specifically, they tend to assume three things that many local agencies do not have: dedicated audit staff capable of evaluating AI system outputs; technical procurement literacy sufficient to assess vendor claims; and inter-agency coordination mechanisms that allow governance standards to flow from federal guidance into state and local implementation.

A county office managing expungement petitions has none of those things. It has a small staff managing heavy caseloads, legacy software, and procurement rules that predate adaptive systems. When we engaged public-sector stakeholders on pilot proposals, their questions were operational: Who is responsible if the tool produces an incorrect filing? Will this increase or decrease the volume of calls we receive? Can we audit the outputs after the fact? How does this fit our existing procurement categories?

These are legitimate governance questions. The frameworks do not ignore them, but they do not operationalize them at the level of institutional capacity that local agencies actually possess. That gap is where governance intentions fail to become governance outcomes.

Capacity constraints are only part of the story. In many cases, agencies have limited incentive to close that gap at all. Political cycles reward visible innovation over invisible governance infrastructure. A procurement that deploys a new system generates a press release. A procurement that builds audit capacity does not. Vendors are incentivized to lead with capability and treat auditability as a secondary feature. Budget structures in most agencies separate technology acquisition from long-term oversight funding, which means the resources to run a governance process are rarely present when the tool is adopted. These incentive mismatches do not compound capacity gaps by accident. They are predictable features of how public-sector technology adoption works.

Authority Is Procedural, Not Abstract

In rights-impacting domains, authority is not a philosophical concept. It is procedural and statutory. A judge has authority. A prosecutor has authority. A clerk has authority within defined limits. Software does not.

This creates a design constraint that cannot be optimized away. Any AI system deployed in a legal workflow must support institutional authority without eroding it, and must make that boundary legible to both users and institutional overseers.

Over-automation produces its own failure mode. When users come to believe that a system is making legal determinations rather than assisting with structured intake, trust erodes. When agencies suspect a vendor is substituting algorithmic judgment for statutory process, adoption stalls. Designing within authority boundaries is not a limitation imposed on technology from the outside. It is a governance principle that has to be built into the system from the start.

State Capacity Is the Binding Constraint

There is a persistent tendency to equate more sophisticated models with better outcomes. In practice, the binding constraint is usually institutional capacity, not model quality.

A well-designed, simple AI tool integrated into a clear, auditable workflow will often produce more reliable public value than a more capable model deployed into a brittle institutional environment. Governance mechanisms only function if the institution has the resources, incentives, and technical literacy to operate them.

The financial derivatives analogy is instructive here, with important limits. In the years preceding the 2008 crisis, instability arose partly because oversight structures, institutional incentives, and operational capacity hadn't kept pace with deployment. Accountability was diffuse. The instruments worked as designed, though some were also mispriced because of model failures, not only institutional lag. That's where the analogy breaks: AI governance failures are more likely to manifest as quiet erosion of due process than as acute systemic collapse. The harms accumulate through friction, delay, and uncorrected errors rather than cascading failure.

But the structural dynamic holds: governance frameworks that assume institutional capacity that doesn't exist will not fail loudly. They will fail through slow non-implementation. Agencies formally adopt frameworks and practically lack the staff to run them.

What Operational Failure Actually Looks Like

When AI integration fails in bureaucratic systems, it rarely looks dramatic. There is no catastrophic output. No clear moment of accountability.

It looks like a mismatched file format. An inconsistent data field that breaks a downstream workflow. A system that performs under normal conditions but degrades under edge-case pressure, and no one notices until errors have accumulated across dozens of cases. An agency that can't categorize a new vendor tool under existing procurement rules, so approval stalls indefinitely.

Regulators worry about worst-case harms. Agencies worry about incremental friction. Vendors worry about scaling. Each actor responds rationally to different incentive structures. Governance requires aligning those incentives before breakdowns occur, not after them. That means building it into procurement criteria, pilot evaluation frameworks, and the accountability structures that determine who owns an error when automation has compressed human review.

Toward an Integrated Frame

If AI governance is framed primarily as a frontier safety challenge, we risk systematically underinvesting in the deployment layer where most current public-sector AI operates. This is not a case against frontier safety research. It is a case for building the institutional infrastructure that makes responsible deployment possible at every layer, including where AI already operates today.

Perfectly aligned systems can still generate governance failures when deployed into brittle institutional environments. And institutional deployment failures, left unaddressed, erode the public trust that any serious AI governance project ultimately depends on.

The questions that a deployment-centered governance frame forces are different: What institutional conditions are required for this system to function accountably? What capacity do agencies actually have, and what do they need? What procurement structures enable responsible adoption? Who owns the error when automation compresses human review?

I've come to believe that AI governance will advance less through any single layer, whether alignment research, regulatory frameworks, or technical standards, and more through whether these layers are developed together, with an honest accounting of where institutional capacity currently stands.

If we treat governance as primarily a frontier problem, we risk neglecting the institutions that must carry it. Institutions do not fail dramatically. They fail quietly, through accumulated friction, unreviewed errors, and the slow loss of public trust in systems adopted before anyone figured out how to operate them accountably.

That is worth sitting with. Deployment-centered governance is slower, more procedural, and less rhetorically satisfying than frontier debates. It produces few headlines. It requires investing in audit capacity, procurement reform, and administrative infrastructure that most political actors have no incentive to champion. There is a real tension here: the same institutional caution that protects due process can also delay access for people who need it. The review checkpoint that preserved accountability in our workflow also made some applicants wait longer. That cost was real.

There is no clean resolution to that tension. There is only the discipline of naming it honestly and building systems that take both sides seriously.

That is where governance is decided. It is, at its core, a deployment problem.

Roger Roman is the Co-Founder and COO of LegalEase (Expungement.ai), a justice-tech platform building AI-assisted workflows for criminal record relief. He writes about AI governance, state capacity, and public-sector deployment.