Agentic AI in financial services
Agentic AI in Financial Services: Use Cases & Risks
By The BlackGrid Team, Enterprise AI ·
Agentic AI in financial services refers to AI systems that act, not just answer — perceiving live data, reasoning over it, planning multi-step workflows, executing actions across tools and systems, and adapting based on outcomes. In banking, that means working an AML alert through to case closure. In insurance, it means taking a claim from first notice of loss to a settlement recommendation. The underlying shift is from AI as a scoring engine that humans act on, to AI as an actor that humans supervise.
McKinsey describes this as a paradigm shift in banking operations, moving staff from rule-based execution toward decisions and customer engagement. Gartner forecasts that by the end of 2026, 40% of enterprise applications will feature task-specific AI agents — up from under 5% in 2025. The pressure to deploy is real; so is the pressure to do it safely in a regulated industry.
What agentic AI means in a finance context
Conventional AI in finance produces outputs — a fraud score, a credit grade, a claims propensity — that a human or a rules engine then acts on. Agentic AI closes the loop. It follows a perceive → reason → plan → act → adapt cycle:
- Perceive: ingest structured and unstructured inputs (transaction feeds, documents, images, third-party data).
- Reason: apply a model, policy, or multi-step chain of thought to understand the situation.
- Plan: determine which actions are needed and in which order, including which tools or APIs to call.
- Act: execute — query a bureau, update a case management system, generate a decision memo, send a notification.
- Adapt: incorporate feedback (additional data, a human reviewer's correction, a downstream outcome) and revise course.
This architecture differs fundamentally from robotic process automation (RPA), which follows fixed scripts and fails on variation, and from standalone ML, which scores but does not execute. A useful reference point: retrieval-augmented generation is one common component in the "perceive" layer, grounding agent reasoning in current documents rather than stale model weights.
Banking use cases
Fraud detection and investigation
Real-time fraud detection is one of the most mature agentic deployments in banking. The first stage — scoring a transaction against behavioral and network features — is established ML. The agentic advance is what happens next: automatically gathering corroborating evidence (related transactions, device fingerprint, velocity checks, linked account history), constructing a case summary, and either resolving the alert or escalating it to an analyst with the relevant context already assembled.
The result is cycle-time compression and analyst leverage: the agent assembles the evidence; the analyst reviews and decides. Fraud teams focus human judgment on genuinely ambiguous cases rather than data retrieval.
AML/KYC alert triage and case closure
Financial crime compliance is one of the highest-cost, lowest-signal workflows in large banks. The majority of AML alerts generated by transaction-monitoring rules close as false positives after manual review. Agentic AI addresses this by automating the triage layer: enriching each alert with customer history, counterparty screening, typology matching, and regulatory context; scoring it for investigative priority; and, for clearly low-risk cases, generating a documented closure rationale.
The key governance requirement here is full audit trail: every enrichment step, every data source consulted, and every closure rationale must be logged and attributable for regulator review. Agentic systems that lack this auditability are not deployable in production AML environments.
Credit underwriting and loan processing
Mortgage and small-business loan underwriting involves pulling credit bureau data, verifying income documents, checking policy rules, calculating DTI and LTV, and generating an underwriting memo — a multi-step workflow that an agentic system can coordinate end-to-end. The agent orchestrates document extraction (often via a vision or OCR model), bureau calls, policy checks, and exception flagging, before presenting a recommendation to an underwriter.
McKinsey highlights credit turnaround time as the primary operational metric: multi-day manual workflows compress toward same-day automated decisioning for in-policy applications, while out-of-policy files route to human underwriters with pre-populated analysis.
For consumer credit decisions, explainability is not optional: CFPB Circular 2022-03 makes clear that adverse-action notice requirements apply even when decisions involve complex algorithms. Any agentic underwriting system must be able to produce human-readable reasons for a denial.
Reconciliation
Back-office reconciliation — matching trades, payments, or ledger entries across systems — is a high-volume, rule-heavy process that is simultaneously tedious and consequential. Agentic AI handles matching at scale, flags genuine breaks for investigation, and can initiate repair workflows for common exception types. The ROI case is operational cost reduction combined with faster break identification.
Customer service
Agentic customer service (account inquiries, servicing requests, product guidance) is the most widely trialed category but tends to sit earlier on the maturity curve than fraud or underwriting. The main differentiator from conventional chatbots is the ability to take backend actions — updating an address, initiating a dispute, retrieving a specific transaction — rather than retrieving static FAQ content.
Insurance use cases
Underwriting and straight-through processing
Commercial and personal lines underwriting involves intake (application data, supplemental documents, third-party data enrichment), risk assessment, pricing, and bind — a sequence that traditionally involves multiple handoffs between teams. Agentic AI enables straight-through processing (STP): the agent ingests the submission, enriches it with external data sources, applies underwriting rules, and outputs a priced indication or a referred file, without manual intervention for in-appetite risks.
The industry direction of travel is clear: a range of carriers and MGAs are targeting STP for their standard-risk book, with complex and non-standard risks remaining human-led. Precise STP rate claims from vendor sources vary widely and should be treated as directional until independently benchmarked.
Claims processing: FNOL to settlement
Claims is the most heavily automated function in insurance, and increasingly the leading deployment area for agentic AI. The workflow runs from first notice of loss (FNOL) through coverage verification, reserve setting, investigation, and settlement.
An agentic FNOL system can handle initial intake across channels (web, voice, app), extract structured claim data from unstructured descriptions, verify coverage against the policy, assign a handler or initiate automatic settlement for simple cases, and trigger investigation workflows for complex ones. For low-complexity, high-frequency losses (minor auto, simple property), end-to-end automation to payment is technically achievable; the remaining design question is governance and policyholder communication.
Claims fraud detection
Claims fraud — across auto, property, workers' compensation, and health — is estimated by the industry to represent a significant share of paid losses annually, though published figures vary by line and methodology. Agentic fraud systems analyze claims against behavioral patterns, social graph data, third-party intelligence, imagery (for property and auto), and provider billing history. The agent doesn't just score: it gathers supporting evidence, cross-references related claims and claimants, and presents an investigation recommendation to a special investigations unit.
Policy servicing
Endorsements, cancellations, renewals, and certificate issuance are high-volume, low-complexity transactions that are well-suited to agentic automation. The agent interprets the request, validates it against policy terms and system rules, executes the change, and generates confirmation documentation — reducing servicing cost and improving turnaround.
Capital markets: a briefer deployment picture
In capital markets, agentic AI is active in research and analysis (automated earnings summaries, macro event briefings, structured data extraction from filings) and in reconciliation (see above). Trading and execution contexts impose stricter human-in-the-loop requirements and regulatory overlays, and remain at an earlier stage of agentic deployment.
What makes financial services different
Finance is not a permissive environment for autonomous AI. Four factors differentiate deployment here from, say, customer-service automation in e-commerce:
Regulation and model risk
Financial institutions operate under an interlocking set of regulatory requirements that directly constrain how AI models are built, validated, monitored, and governed.
US federal bank regulators revised model risk management guidance in OCC Bulletin 2026-13 and SR 26-02 (April 2026), issued jointly by the OCC, Federal Reserve, and FDIC. Notably, the revised guidance states that generative AI and agentic AI models "are novel and rapidly evolving" and "are not within the scope of this guidance" — and the agencies have signaled a forthcoming request for information on AI specifically. The practical effect is that governance for these systems must be derived from other frameworks rather than the familiar SR 11-7 model inventory process.
The load shifts to voluntary and sector-specific frameworks: the NIST AI Risk Management Framework (AI RMF 1.0) and its GenAI Profile (AI 600-1), which identifies 12 risk categories; the US Treasury Financial Services AI Risk Management Framework (February 2026); and the ISO/IEC 42001:2023 AI management system standard.
In Europe, the EU AI Act (Regulation (EU) 2024/1689), in force since August 2024, classifies creditworthiness assessment and credit scoring of individuals, and risk assessment and pricing in life and health insurance, as high-risk AI systems (Annex III; fraud detection is carved out, and other lines such as P&C may be captured only via other provisions). These applications face conformity assessment, transparency, and human oversight obligations. Under the Act as currently in force, those obligations apply from 2 August 2026 — but a provisional political agreement under the EU Digital Omnibus (Council and Parliament, May 7, 2026) would defer the high-risk deadlines to December 2027 (standalone Annex III systems) and August 2028 (AI embedded in regulated products). That agreement is not yet final law — it takes effect only on formal adoption and publication in the Official Journal — and it does not remove the underlying obligations. Treat the deferral as proposed, not settled.
GDPR Article 22 already applies to automated individual decisions with legal or similarly significant effects — relevant for credit and insurance decisions today.
Explainability and adverse-action obligations
When an AI system contributes to a consequential decision — denying credit, declining an insurance application, flagging a customer for enhanced due diligence — regulators and consumers may have a right to explanation. CFPB Circular 2022-03 confirms that adverse-action notice requirements under ECOA and Regulation B apply even when a credit decision is based on a complex algorithm, and that a creditor's inability to explain its own model is not a defense. Any agentic system in the credit decision path must be able to produce human-intelligible reason codes, not just a black-box score.
NYDFS guidance (Industry Letter, October 2024) specifically addresses AI-related cyber risks — including third-party and vendor vulnerabilities — and frames them within the state's existing cybersecurity regulation (23 NYCRR Part 500) rather than as new rules, sharpening vendor-management and security-testing expectations for New York–licensed entities.
Human-in-the-loop design
Production-grade agentic systems in finance are not fully autonomous: they are designed with defined escalation paths — cases above a risk threshold, outside appetite, or involving novel fact patterns are routed to human reviewers with the agent's analysis pre-populated. The design question is not whether to include human oversight, but where to place the boundary and how to ensure the escalation is reliable.
Auditability
Regulators, internal audit, and model risk functions all require that AI-driven decisions be reconstructable after the fact: what data was used, which model version produced the score, what rules were applied, and what the human reviewer saw at the time of decision. Agentic systems that use ephemeral context or do not log intermediate reasoning steps are not suitable for regulated financial workflows.
Agentic AI vs. earlier automation approaches
| Dimension | RPA | Traditional ML | Agentic AI |
|---|---|---|---|
| Input handling | Structured, scripted | Structured features | Structured + unstructured |
| Autonomy | Rigid rule-following | Scores/predicts only | Plans, acts, adapts |
| Exception handling | Fails or escalates blindly | Outputs a score; human acts | Gathers evidence; escalates with context |
| Auditability | Step logs | Model outputs | Requires explicit logging design |
| Governance complexity | Low | Moderate | High — model risk + agentic-specific |
The practical implication: agentic AI offers substantially greater scope automation than RPA and substantially more workflow completion than standalone ML — but the governance overhead is correspondingly higher. The framework overhead is not a reason to avoid deployment; it is a reason to design governance in from the start rather than retrofit it.
From pilot to production: why projects stall
Gartner predicts that more than 40% of agentic AI projects will be canceled by end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. In financial services, the gap between a compelling demo and a production-grade deployment typically comes down to four factors:
- Data quality and access. Agentic systems are only as good as the data they can reach. Fragmented core banking systems, inconsistent data schemas, and missing API surfaces are the most common blockers.
- Integration depth. A real workflow agent needs read/write access to case management, policy admin, or core banking systems — not just a read-only data extract. Gaining those integrations safely takes time and IT governance cycles.
- Governance and validation. Model risk, compliance, and legal reviews add time that teams don't budget for. In a YMYL-adjacent domain, this is not optional overhead — it is the work.
- Evaluation discipline. Unlike a classification model with a held-out test set, an agentic workflow's quality is measured over sequences of decisions. Teams that don't establish evaluation criteria before deployment lack the signal to know whether the system is performing.
Production deployments that succeed start with a narrow, well-defined workflow where the agent's actions are reversible or supervised; they instrument everything from day one; and they treat governance as a first-class deliverable alongside the model.
The governance and integration work that separates demos from deployments is where most pilots stall.
Frequently asked questions
- What is agentic AI in financial services?
- Agentic AI refers to AI systems that can perceive data, reason over it, plan a multi-step course of action, execute that plan — often by calling tools or APIs — and adapt based on feedback, without requiring a human to direct every step. In financial services, this means systems that can work an AML alert from triage through case closure, process an insurance claim from FNOL to settlement recommendation, or run a credit underwriting workflow end-to-end, while escalating edge cases to a human reviewer.
- What are the top agentic AI use cases in banking?
- The highest-value banking use cases are: fraud detection and investigation (real-time scoring plus automated evidence gathering); AML/KYC alert triage and case closure (reducing analyst workload on low-risk alerts); credit underwriting and loan processing (automated document extraction, bureau pull, policy check, and decision); and reconciliation (matching transactions across systems at scale). Customer service automation is also widely deployed but tends to be earlier-stage on the agentic maturity curve.
- What are the top agentic AI use cases in insurance?
- Insurance's leading use cases are: straight-through processing (STP) in commercial and personal lines underwriting (reducing manual touchpoints from intake to bind); first-notice-of-loss (FNOL) intake and triage in claims; claims fraud detection (pattern analysis across claims history, third-party data, and imagery); and policy servicing (endorsements, cancellations, renewals). Claims-related automation is the most heavily invested category across the industry.
- Is agentic AI compliant and safe for regulated finance?
- Agentic AI can be deployed in compliance with applicable regulations, but it requires deliberate architecture choices: explainable decision logic, human-in-the-loop escalation for consequential decisions, full audit trails, and adversarial/bias testing. US model risk guidance (OCC 2026-13 / SR 26-02, April 2026) was recently revised but states that generative and agentic AI are 'not within the scope of this guidance,' meaning governance must draw on the NIST AI RMF, the US Treasury Financial Services AI Risk Management Framework, and existing law such as CFPB adverse-action requirements and GDPR Article 22. The EU AI Act places credit scoring and life/health insurance pricing in its high-risk tier; compliance deadlines for those obligations have been provisionally deferred (Digital Omnibus) but are not yet final law.
- How is agentic AI different from RPA or traditional machine learning?
- RPA follows rigid, rule-based scripts and breaks on variation. Traditional ML produces a score or prediction but does not act on it. Agentic AI combines reasoning, tool use, and autonomous action: it can interpret unstructured inputs, decide which steps to take, call external systems, handle exceptions, and self-correct — within defined guardrails. The practical difference is task completion versus task assistance: an agentic system works an alert or a claim to a resolution, not just a flag.
Sources
- Gartner, Over 40% of agentic AI projects canceled by end of 2027 (Jun 25, 2025)
- Gartner, 40% of enterprise apps to feature task-specific AI agents by 2026 (Aug 26, 2025)
- McKinsey, The paradigm shift: how agentic AI is redefining banking operations (2025–26)
- McKinsey, State of AI trust in 2026: shifting to the agentic era
- OCC Bulletin 2026-13, Model Risk Management: Revised Guidance (Apr 17, 2026)
- EU AI Act — Regulation (EU) 2024/1689; in force Aug 1, 2024
- EU Digital Omnibus provisional agreement — Council of the EU (May 7, 2026)
- CFPB Circular 2022-03, Adverse action notification requirements and complex algorithms (May 26, 2022)
- GDPR Article 22 — automated individual decision-making
- NYDFS Industry Letter on AI cyber risks (Oct 16, 2024)
- NIST AI RMF 1.0 (Jan 26, 2023) and GenAI Profile AI 600-1 (Jul 26, 2024)
- US Treasury, Financial Services AI Risk Management Framework (Feb 19, 2026)
- ISO/IEC 42001:2023 — AI management systems
- IBM, Agentic AI vs. generative AI