Model Risk Management for Agentic AI

Agentic AI and model risk: US guidance (OCC 2026-13) now excludes it. How to govern agents with NIST, ISO 42001, and Treasury's framework until rules catch up.

There is a gap at the center of agentic AI governance in US banking, and it opened in 2026. The revised model-risk guidance — OCC Bulletin 2026-13 and SR 26-02, issued jointly by the OCC, Federal Reserve, and FDIC — states that generative and agentic AI are 'novel and rapidly evolving' and 'are not within the scope of this guidance.' The familiar rulebook for governing models in a bank now explicitly excludes the most consequential new class of them. This is the governance reality behind every use case in agentic AI in financial services.

Diagram: revised US model-risk guidance (OCC 2026-13 / SR 26-02) covers classic models but puts generative and agentic AI out of scope, so agent governance drops through to NIST AI RMF, ISO/IEC 42001, the US Treasury FS AI RMF, and existing law — CFPB 2022-03, the EU AI Act, GDPR Article 22 — landing in an agentic MRM program.

What model risk management was built for

For over a decade, US banks governed quantitative models under SR 11-7: a model inventory, independent validation, ongoing monitoring, and clear ownership across the model lifecycle. It assumes a definable model — stable inputs, a measurable output (a score, a rating, a forecast) — that an independent team can test and challenge.

Why agentic AI breaks the classic approach

An agent is not that kind of model. It is non-deterministic, it evolves, it calls external tools, and it acts across multiple steps rather than emitting a single score. Validation can no longer ask only "is this output accurate?" — it must ask "across sequences of decisions and actions, does this system behave within policy, and can we prove it did?" That is a different discipline, which is why the agencies carved agentic AI out of the existing guidance and signaled a future request for information.

The framework stack that fills the gap

Until dedicated rules arrive, governance is assembled from voluntary frameworks and existing law:

NIST AI Risk Management Framework — the Govern, Map, Measure, Manage core, plus a GenAI Profile that enumerates risk categories (confabulation, prompt injection, data poisoning, and more).
ISO/IEC 42001:2023 — the first certifiable AI management system standard, providing an auditable, Plan-Do-Check-Act governance backbone.
US Treasury Financial Services AI RMF (February 2026) — a sector-specific adaptation of the NIST framework for financial institutions.
Existing law. CFPB Circular 2022-03 requires explainable adverse-action reasons even for complex algorithms; the EU AI Act classifies credit and life/health-insurance pricing as high-risk.

How to validate a non-deterministic agent

Classic validation hands an independent team a model and a held-out dataset. That does not translate to an agent, so validation has to be rebuilt around behavior over sequences. In practice that means: an evaluation set of representative cases scored on both the outcome and the trajectory — did it take sensible steps and call the right tools; scenario and red-team testing that probes edge cases, prompt injection, and failure under bad inputs; challenger comparisons against the prior approach or a simpler baseline; and ongoing monitoring in production, because an agent that passed at launch can drift as data and behavior change. None of this is a one-time gate — it is a continuous cycle, which is exactly how the NIST AI RMF frames its Measure and Manage functions. Document the cases, the results, and the decisions, so the validation itself is auditable.

What an agentic model-risk program needs

Mapping those frameworks onto an actual agent comes down to five capabilities, designed in from the start:

Validation of non-deterministic behavior — test over decision sequences and edge cases, not a single held-out set.
A complete audit trail — reasoning, data sources, tool calls, and the human's view at decision time, all reconstructable after the fact.
Human-in-the-loop checkpoints — defined thresholds above which actions route to a person.
Ongoing evaluation — quality measured continuously, because agents and data drift.
Monitoring and rollback — detect degradation and reverse course safely.

These are the same controls behind a defensible audit trail and explainable lending, and that make AML/KYC automation and agentic AI in banking deployable — easier still when retrieval itself is governed, as in agentic RAG.

The frameworks are not a reason to delay. They are the blueprint for deploying agents in a regulated institution you can stand behind. Talk to BlackGrid about building that program.

Frequently asked questions

Does SR 11-7 cover agentic AI?

Not as of 2026. The revised US model-risk guidance (OCC 2026-13 / SR 26-02) states that generative and agentic AI are novel and rapidly evolving and are not within its scope. Institutions must govern these systems using other frameworks and existing law until dedicated rules arrive.

Why doesn't classic model risk management fit AI agents?

Traditional MRM assumes a defined model with stable inputs and outputs that can be independently validated. An agent is non-deterministic, evolves, calls tools, and acts across multiple steps — so validation has to cover sequences of decisions and actions, not a single scored output.

What frameworks fill the gap?

The NIST AI Risk Management Framework (Govern, Map, Measure, Manage) and its GenAI Profile; ISO/IEC 42001, a certifiable AI management system standard; the US Treasury Financial Services AI Risk Management Framework; plus existing law such as CFPB adverse-action rules and GDPR Article 22.

What does an agentic model-risk program need?

Validation of non-deterministic behavior, a complete audit trail of reasoning and actions, defined human-in-the-loop checkpoints, ongoing evaluation over decision sequences, and monitoring for drift. Design these in from the start rather than retrofitting them.