What Large Language Models Do and Why Financial Institutions Should Care

Artificial intelligence has become one of the most discussed topics in financial services. Boards are being briefed on it. Strategy documents reference it. Vendors are attaching it to every product pitch. And yet, in most of the financial institutions we speak with, the senior professionals responsible for risk, compliance, and investment decisions have a surprisingly limited understanding of what AI systems and large language models in particular actually do under the hood.

This matters because the gap between the hype and the reality of AI creates two equally damaging failure modes. The first is uncritical adoption: deploying AI tools without understanding their limitations, and discovering those limitations in consequential situations. The second is reflexive rejection: dismissing AI as a passing trend or an unacceptable risk, and missing the genuine analytical advantages it offers to institutions that deploy it thoughtfully.

This article is for the professionals in the middle: those who want a grounded, honest understanding of what large language models are, how they work, where they are genuinely useful, and where they are not. No technical background is required.

What a Large Language Model Actually Is

A large language model is a type of artificial intelligence system trained to process and generate text. The word "large" refers to the scale of the system: modern LLMs are trained on enormous quantities of text drawn from books, articles, websites, research papers, and other written sources, and they contain billions of mathematical parameters that encode patterns learned from that training data.

The word "model" refers to the fact that the system is, at its core, a mathematical representation of statistical relationships between words, phrases, and concepts. An LLM does not understand language in the way a human does. It has learned, through exposure to vast quantities of text, which words and ideas tend to appear together, which arguments tend to follow which premises, and which answers tend to follow which questions. When you ask an LLM a question, it generates a response by predicting, one token at a time, what text is most likely to follow the input it has received.

This distinction between statistical prediction and genuine understanding is the single most important thing for financial professionals to grasp about LLMs. It explains both why they are extraordinarily capable at certain tasks and why they fail in certain ways that a human expert would not.

How Training Works

LLMs are trained in two broad phases.

The first phase is pre-training. The model is exposed to a vast corpus of text and learns to predict the next word in a sequence, over and over again, billions of times. Through this process, the model develops a rich internal representation of language: grammar, vocabulary, factual associations, logical patterns, stylistic registers, and much more. Pre-training is computationally intensive and expensive, requiring specialised hardware running for weeks or months.

The second phase is fine-tuning and alignment. The pre-trained model is further trained on curated datasets and through a process of human feedback to make it more useful, accurate, and safe for real-world deployment. This phase shapes the model's behaviour: how it responds to ambiguous questions, what it declines to do, how it handles uncertainty, and how it presents information to users.

The result is a system that can engage in sophisticated, contextually aware conversation across an enormous range of topics, from drafting a regulatory disclosure summary to explaining a complex financial instrument to analysing a document for key risk factors. The breadth of capability is genuinely remarkable. The limitations, however, are equally real.

What LLMs Are Good At

For financial institutions, the practical value of LLMs falls into several well-defined categories.

Processing and summarising large volumes of text. Reading, synthesising, and extracting key information from long documents is one of the most time-consuming tasks in financial services. LLMs can process regulatory consultations, company reports, research papers, and compliance documents in seconds, producing summaries, extracting specific data points, and flagging relevant passages. A task that might take an analyst half a day can be completed in minutes.

Generating structured outputs from unstructured inputs. Many sustainability and risk workflows involve translating unstructured information, including company disclosures, news articles, and regulatory guidance, into structured formats for analysis. LLMs can do this at scale, enabling institutions to process far more information than would be possible manually.

Drafting and editing. LLMs are highly capable writing assistants. For institutions producing large volumes of disclosure documentation, regulatory submissions, client communications, and internal reports, LLM-assisted drafting can significantly reduce the time and effort involved while maintaining quality. The outputs require human review and editing, but the starting point is dramatically better than a blank page.

Question answering over large document sets. Through a technique called retrieval-augmented generation (RAG), LLMs can be connected to an institution's internal document repositories and databases, allowing users to ask questions in natural language and receive answers grounded in specific internal sources. This has significant applications in compliance, where staff need to quickly locate relevant policy provisions, regulatory requirements, or precedent decisions.

Regulatory monitoring and interpretation. The sustainable finance regulatory landscape produces a continuous stream of new guidance, consultation papers, framework updates, and supervisory expectations. LLMs can monitor and summarise these developments at scale, flagging relevant changes and providing initial interpretations that human experts then review and contextualise.

What LLMs Are Not Good At

Understanding the limitations of LLMs is as important as understanding their capabilities, particularly for institutions where errors have regulatory, financial, or reputational consequences.

LLMs can confabulate. This is the phenomenon widely described as hallucination: the tendency of LLMs to generate plausible-sounding but factually incorrect information. Because LLMs generate text by predicting what is likely to come next rather than by retrieving verified facts, they can produce confident-sounding statements that are simply wrong. They may invent citations, misstate regulatory requirements, or fabricate numerical data. In a general conversation, a confabulated answer is an inconvenience. In a regulatory compliance context, it can be a serious problem.

LLMs have a knowledge cutoff. Pre-trained models have a training data cutoff date, after which they have no knowledge of new developments. Regulatory frameworks, market conditions, and supervisory guidance change continuously. An LLM relying on its pre-trained knowledge base for regulatory interpretation may be working with outdated information. This is why retrieval-augmented approaches, which connect the model to current external sources, are important for compliance-sensitive applications.

LLMs cannot reason reliably about numbers. Mathematical reasoning is not what LLMs are optimised for. They can describe quantitative concepts, explain methodologies, and format numerical data, but they should not be relied upon to perform complex calculations or numerical analysis without verification. For quantitative risk analytics, LLMs are best used as an interface layer rather than as the analytical engine.

LLMs reflect the biases in their training data. The patterns learned during pre-training reflect the biases, assumptions, and gaps in the text corpus the model was trained on. For applications involving ESG assessment, credit analysis, or any domain where bias has material consequences, the outputs of LLMs require careful human review.

LLMs are not auditable in the traditional sense. The internal workings of large language models are not transparent in the way that a rules-based system or a conventional analytical model is. Explaining exactly why an LLM produced a particular output is technically challenging. For applications where auditability and explainability are regulatory requirements, this creates real constraints on how LLMs can be deployed.

The Human-in-the-Loop Principle

The limitations above do not make LLMs unsuitable for financial services. They define the conditions under which LLMs can be deployed responsibly.

The principle that emerges from serious AI deployment in regulated industries is what practitioners call human-in-the-loop design: structuring AI-assisted workflows so that human expertise remains in the decision-making role, while AI handles the data processing, summarisation, and analytical support that would otherwise consume disproportionate human time.

In practice this means the following. An LLM can summarise a 200-page company sustainability report and extract the key risk disclosures. A human analyst reviews the summary, applies their contextual knowledge of the company and sector, and reaches a conclusion. The LLM has not made the risk assessment. It has dramatically reduced the time required for the analyst to make an informed one.

This distinction between AI as an analytical infrastructure layer and AI as a decision-maker is not merely a philosophical one. It is increasingly what regulators expect to see when they examine how financial institutions are integrating AI into material risk and compliance processes.

Why This Matters Specifically for Sustainable Finance

The intersection of LLMs and sustainable finance is particularly significant for two reasons.

First, sustainable finance is an information-intensive domain. TCFD reports, TNFD assessments, CSRD disclosures, ESG data providers, regulatory consultations, science-based target documentation, transition plans: the volume of text that needs to be processed, synthesised, and acted upon by sustainability teams is growing faster than headcount. LLMs offer a genuine solution to a genuine operational problem.

Second, the quality of AI outputs in sustainable finance depends critically on the quality of the underlying methodology. An LLM connected to rigorous, independently developed analytical frameworks for climate and nature risk will produce outputs that are defensible and decision-useful. An LLM connected to generic ESG data of variable quality will produce outputs that are superficially impressive but analytically shallow. The methodology is the intellectual asset that determines whether AI in sustainable finance is genuinely useful or merely efficient at producing the appearance of analysis.

This is why the most credible AI applications in sustainable finance are those developed by organisations with deep subject matter expertise in both the financial and environmental dimensions of the problem, rather than technology firms applying general-purpose AI to a domain they understand superficially.

What This Means for Your Institution

LLMs are not a technology to adopt uncritically or dismiss reflexively. They are a powerful analytical tool with well-defined capabilities and well-defined limitations, and the institutions that will get the most value from them are those that understand both.

The starting point for most financial institutions is not a wholesale AI transformation. It is identifying the two or three specific workflows where the combination of high document volume, time pressure, and analytical complexity makes LLM-assisted processing genuinely valuable. Regulatory monitoring, sustainability disclosure drafting, and ESG data extraction are typically the highest-value starting points.

From there, the critical investment is not in the technology itself but in the governance, oversight, and methodology that ensures AI outputs are reliable, auditable, and fit for the regulatory context in which they will be used.

The institutions that build that capability now, thoughtfully and with appropriate human oversight, will have a meaningful analytical advantage over those that either adopt AI carelessly or avoid it altogether.

‍