Agent Governance Foundation

CISO Brief: The Actual Attack Surface of Deployed Agents

Agent Governance Foundation
ciso-briefsecurityattack-surfacerevocationexecutive

Third in our CISO Brief series. See also: what securing AI agents actually means and mapping agent governance to compliance frameworks.

"The AI might do something unexpected" is the risk statement most agent security conversations start with, and it's too vague to act on. It doesn't tell you what to monitor, what to alert on, or what control stops the incident before it spreads. A useful risk model needs specific attack patterns, not a general sense of unease.

Four patterns show up repeatedly across agent deployments. They map onto standard security taxonomy, but they present differently when the actor is an agent acting on interpreted natural-language intent rather than a human executing a known script.

1. Data exfiltration

An agent with read access to sensitive data and any outbound channel — an email tool, a webhook call, a file write to a shared location — can move data out of scope without anyone issuing a command that looks malicious. The trigger is often not a compromised credential; it's a manipulated instruction, whether from a prompt injection embedded in a document the agent read, or a legitimate-looking task that happens to combine read and egress permissions the agent shouldn't have held simultaneously.

What to watch for: unusual combinations of read scope and outbound capability on the same agent; volume or destination anomalies on actions that are individually authorized.

2. Privilege escalation

Multi-agent systems delegate: an orchestrator spawns specialist sub-agents, each nominally scoped to a narrower task. The failure mode is a sub-agent ending up with — or being granted, or inheriting — more authority than the task required, either through a scoping bug or because a delegation chain wasn't actually bounded by the parent's own permissions.

What to watch for: a sub-agent's requested scope exceeding what its parent was authorized for; delegation chains where trust isn't strictly non-increasing at each hop.

3. Resource hijacking

An agent authorized for a legitimate compute- or resource-intensive task gets redirected — again, often via a manipulated instruction rather than a stolen credential — toward a different resource entirely: spinning up infrastructure, consuming API quota, or running a workload that has nothing to do with its original purpose.

What to watch for: action patterns that diverge from an agent's established baseline, even when each individual action is within its nominal scope.

4. Lateral movement

An agent compromised or manipulated in one context uses its legitimate delegation relationships to reach systems or data it was never meant to touch directly — riding trust relationships between agents rather than exploiting a new vulnerability per hop.

What to watch for: an agent's actions crossing into resources or systems unrelated to its declared scope, especially via a delegation chain rather than a direct request.

The controls that address these

None of these four patterns are stopped by better prompting or a stronger system message — a system prompt is not an authorization control. The controls that actually apply:

  • Per-action policy evaluation, not per-session. A session-level "this agent is trusted" grant is exactly what lets exfiltration and hijacking hide inside otherwise-authorized sessions. Every action needs its own decision.
  • Delegation-aware trust scoring. Trust shouldn't be inherited wholesale down a delegation chain — deeper chains should carry more scrutiny, not less, which is the direct countermeasure to privilege escalation and lateral movement.
  • Behavioral anomaly detection running continuously, not just at deployment review. A five-minute scan cadence against known attack patterns catches drift between what an agent normally does and what it's doing right now — far faster than a quarterly access review would.
  • Branch-cut revocation. When containment is needed, it has to suspend the compromised agent and everything it delegated to atomically — a single agent revocation that leaves its sub-agents live isn't containment, it's a delay. See why revocation doesn't scale the same way authentication does.
  • Just-in-time, expiring credentials. An agent that only holds a credential for the duration of the task it needs it for shrinks the exfiltration and hijacking window to that task's lifetime, not the agent's entire operational life.

The five-step response cycle

When one of these patterns triggers an alert, the playbook is the same regardless of which pattern it was:

  1. Detect — anomaly alerts, trust score drops, or external reports surface the incident.
  2. Contain — revoke the affected agent immediately, cascading to sub-agents if it has any. Revocation needs to take effect in milliseconds, not minutes — every subsequent decision should deny by default from that point forward.
  3. Investigate — pull the signed, timestamped audit artifacts for the affected agent and replay exactly what it did and when, rather than reconstructing the timeline from application logs.
  4. Remediate — fix the root cause: rotate credentials, patch the agent, tighten the policy that should have caught this.
  5. Reinstate — bring the agent back with a new identity and a clean trust score, not a resumed one — an incident shouldn't leave residual trust attached to the replacement.

The takeaway

None of these four patterns require a novel defense. They require applying the access-control discipline you already run for human identities — least privilege, per-action authorization, fast and total revocation — to an actor that makes its own decisions about which permissions to use, at machine speed, based on instructions it can't fully verify. The gap isn't a new category of threat. It's that most agent deployments today don't have the enforcement layer to apply the old discipline at all.

Related