LLM Agents in Fintech: What Actually Works in Production

In 2024 we shipped an agentic AI portfolio assistant at BeyondIRR. It can answer questions like "How has my equity allocation changed in the last 6 months?" or "Which funds in my portfolio are underperforming their benchmark?" by reasoning over real financial data.

The demo looked great. Getting it to production without hallucinating financial data, violating DPDP Act requirements, or giving clients terrible investment advice — that was the real challenge.

What We Built

The system is a ReAct-style agent with a curated set of tools: a portfolio query tool (read-only SQL access), a market data tool (current prices, fund details), and a calculation tool (XIRR, CAGR, allocation percentages). The LLM plans, the tools execute, the LLM synthesises.

tools = [
    PortfolioQueryTool(db=read_replica),  
    MarketDataTool(sources=["amfi", "nse"]),
    CalculatorTool(allowed=["xirr", "cagr"]),
]

agent = ReActAgent(
    llm=claude_sonnet,
    tools=tools,
    max_steps=8,
    guardrails=FinancialGuardrails(),  # Critical
)

The Guardrails Problem

The hardest part was not building the agent — it was preventing it from doing things it shouldn't. In fintech, an LLM that confidently states the wrong NAV, or gives implicit investment advice, is a regulatory and reputational risk.

Every LLM response in a financial context needs to be treated as a potential compliance liability. Design accordingly.

We implemented three layers of guardrails:

Tool sandboxing — the agent can only read data, never write. Every query goes through a read replica.
Output validation — numeric values in responses are cross-checked against source data. If the LLM says a fund returned 32% but the data says 3.2%, the response is rejected.
Disclaimer injection — any response touching returns, performance, or recommendations automatically gets a regulatory disclaimer appended.

The Latency Problem

Multi-step agentic workflows are slow. Our first version took 12-18 seconds end to end. We got this down to 3-4 seconds through aggressive caching, parallel tool calls where possible, and streaming the response as tokens arrive.

The result: a chatbot that genuinely helps clients understand their portfolio without the risk of a rogue LLM sending them to the wrong financial conclusions.