Agent Governance Foundation

Authentication scales. Revocation doesn't.

Foundation
revocationidentityiamarchitecture

Every IAM engineer learns this asymmetry early: issuing a credential is cheap. Revoking one is hard. We've spent decades building elegant solutions for the first problem and increasingly uncomfortable workarounds for the second.

For human users, the workarounds are tolerable. A 15-minute OAuth token expiry, an hourly OCSP check, a session invalidation table — these are fine when you're managing logins for employees who sign in once and work for eight hours. When something goes wrong, you reset the password, kill the session, and the blast radius is bounded by what one person could do in the propagation window.

Agents break every one of those assumptions.

Why authentication is (largely) solved

The problem of proving identity has been cracked. PKI, OAuth 2.0, OpenID Connect, FIDO2 — these are battle-tested systems that work at massive scale. Issuing a signed credential takes microseconds. Verifying one is even faster: check the signature, check the expiry, check the issuer. No network call required in the common case.

The engineering effort invested in authentication shows. Token issuance systems handle millions of operations per second. Verification can be done entirely offline against a public key. The cryptography is well-understood, the attack surface is mapped, and the failure modes are predictable.

This is the part of identity that agent frameworks have mostly inherited intact. An agent can receive a signed JWT asserting who it is, what it's authorized to do, and when that authorization expires. Verification of that token is just as fast for an agent as for a human. Authentication, as a problem, transfers cleanly.

Why revocation doesn't

Revoking a credential requires informing every system that might accept it that the credential is no longer valid. That's a distributed broadcast problem, and it doesn't get easier with scale — it gets harder.

Certificate Revocation Lists (CRLs) are the oldest approach: a signed list of revoked serial numbers, published periodically. They work until they don't. CRLs grow large, fetching them adds latency, and the update window means that a revoked certificate remains valid anywhere the old list is cached. In practice, many clients don't check CRLs at all.

OCSP improved on this with real-time stapling, but introduced a new problem: a centralized responder becomes a high-value target and a single point of failure. OCSP Must-Staple helps, but adds complexity and its own failure modes.

JWT-based systems typically sidestep revocation entirely by using short expiry windows. Can't revoke a token? Make it expire in 15 minutes. This works as a pragmatic tradeoff for user sessions, but it's a workaround that papers over the underlying problem. You're not revoking compromised credentials — you're just limiting the window during which they can be exploited.

How agents make every revocation failure mode worse

Offline agents

A human user authenticates at login and stays online. When their session is revoked, the next request they make will fail. The window between revocation and detection is bounded by the next interaction.

Agents don't work that way. An agent handling a long-running task — processing a batch job, monitoring a data stream, executing a multi-step workflow — may make no outbound requests for minutes or hours while continuing to hold valid credentials. If those credentials are revoked mid-task, the agent has no mechanism to discover this unless it explicitly checks for revocation before each action.

Most agent frameworks don't build in revocation checks. They issue a token at startup and use it for the task's lifetime. An agent that should have stopped 30 minutes ago is still running, still holding active credentials, still capable of taking consequential actions.

Deep delegation chains

In multi-agent systems, authority flows through a delegation tree. A human authorizes an orchestrator; the orchestrator delegates to specialized sub-agents; those sub-agents may spawn further tools or workers. Each link in the chain carries a credential derived from the one above it.

When the human's authorization needs to be revoked — because they left the organization, because the task was cancelled, because a security incident was detected — you need to invalidate not just the human's credential, but every derived credential downstream.

With traditional IAM, this is a table lookup: find all sessions, tokens, and grants associated with the principal and revoke them. That works when the graph is shallow and centrally tracked. In a multi-agent system where delegation can happen dynamically, where sub-agents are spawned on demand, and where the full graph may not be centrally recorded, finding everything to revoke is itself a hard problem before you even begin propagation.

Propagation delay

Even with the right architecture, revocation signals take time to propagate. Policy updates sync to distributed PDPs on a pull schedule. Cache TTLs introduce lag. Network partitions can delay propagation further.

For human users, a 30-second propagation window is rarely consequential. For an agent executing financial transactions, sending external communications, or modifying production infrastructure, 30 seconds is enough time to take dozens of irreversible actions after authorization should have been terminated.

The window isn't a bug — it's a fundamental property of distributed systems. But agent deployments amplify its consequences in ways that human-facing IAM was never designed to handle.

The branch cut model

The insight that changes the problem is this: you don't need to revoke an identity. You need to cut a branch.

In a delegation tree, when something goes wrong with one agent or one delegation path, the goal is not to invalidate the root identity (which would affect everything) or to hunt down every leaf node (which is expensive and error-prone). The goal is to sever the delegation at the point where trust broke down.

Consider this tree:

Human (Alice)
├── Agent A — orchestrator          ← branch cut applied here
│   ├── Agent B — email sender         ✗ delegation invalidated
│   │   └── Agent C — scheduler        ✗ delegation invalidated
│   └── Agent D — calendar writer      ✗ delegation invalidated
└── Agent E — reporting agent       ← unaffected
    └── Agent F — data export          ← unaffected

If Agent A's behavior is anomalous — it's been prompt-injected, it's exceeding its authorized scope, its session was compromised — the right response is to invalidate Agent A's delegation token. Because B, C, and D all hold credentials derived from A's delegation, those credentials are automatically invalidated when A's token is revoked. Alice's root authority is unaffected. Agent E and F continue operating normally.

The branch cut model requires that every delegation token explicitly encodes its parent. When a token is revoked, any token that lists it as a parent in its lineage chain is transitively invalidated. Checking this doesn't require a global broadcast — it requires that each PDP, before accepting a delegation token, verify that no ancestor in the chain has been revoked.

This is implementable as a revocation index at the Trust Evaluation Service: a compact set of revoked delegation IDs, with a monotonic version number. PDPs pull this index periodically and check it on every delegation-chain validation. The index is small (revocations are infrequent) and the check is fast (set membership). Propagation lag is still present, but its scope is bounded to the time between index updates rather than a full policy sync.

Expiration by default

The second structural change is treating expiration not as a fallback but as the primary revocation mechanism.

In traditional IAM, credentials often default to long-lived or non-expiring. Revocation is the mechanism for exceptional cases — when something goes wrong. The burden of revocation is high, so long-lived credentials reduce operational overhead.

For agents, this should be inverted: permissions should be short-lived leases that decay unless explicitly renewed.

A delegation token issued to an agent for a specific task should carry an expiry tied to the expected task duration, not a generic session length. A token scoped to "process this week's invoice batch" should expire in hours, not days. A token scoped to "monitor this data stream" should carry a renewal obligation — the agent must check in with the Domain Authority at regular intervals to confirm its authorization is still valid.

Expiration by default achieves two things. First, it bounds the blast radius of any individual credential: a compromised token is only useful until its expiry. Second, it creates natural checkpoints for re-evaluation — at each renewal, the Domain Authority can verify that the principal's authorization is still current, that no revocation has occurred upstream, and that the environmental risk hasn't crossed a threshold.

The cost is increased load on the Domain Authority for renewal requests. This is a worthwhile tradeoff. Short-lived tokens with renewal obligations are more operationally complex than long-lived ones, but they're significantly more recoverable when something goes wrong.

What this requires from governance infrastructure

Neither the branch cut model nor expiration by default can be implemented in application code alone. They require infrastructure:

A delegation registry that records every parent-child relationship in the delegation tree, queryable by the PDP at validation time. Without this registry, you cannot traverse the tree to check for revoked ancestors.

A revocation index maintained by the Trust Evaluation Service — a compact, versioned record of invalidated delegation IDs that PDPs can sync efficiently. The index needs to be append-only and cryptographically signed so its integrity can be verified.

Token lineage encoding in every delegation credential: each token must carry a reference to its parent delegation ID so that ancestry can be checked without querying the issuing agent. The chain needs to be self-describing.

Renewal protocols that allow agents to extend expiring credentials through an authenticated interaction with the Domain Authority, with that renewal logged as an auditable event.

These are not novel cryptographic ideas. They borrow from certificate transparency, OAuth token exchange, and SPIFFE/SPIRE patterns that already exist in production infrastructure. What's missing is a coherent specification that pulls them together for the agent context — one that any agent framework can implement, with a conformance suite to verify correctness.

That specification is what we're building. The revocation problem is solvable. It just requires treating agents as a distinct class of principal rather than a variation on a service account.


The delegation registry and revocation index are components of the Agent Authorization Protocol (AAP) specification, currently in early draft. Feedback from IAM practitioners and agent framework developers is particularly valuable at this stage.