In today’s complex financial landscape, global systemically important banks (G-SIBs) generate far more than just numbers—every earnings call, analyst Q&A, and executive comment is a window into potential risk. Yet much of this information remains buried in lengthy, unstructured transcripts, inaccessible to traditional analytics.
As part of a team project with the Bank of England’s Prudential Regulation Authority (PRA), we set out to change that.
Our mission: build a robust, end-to-end NLP pipeline capable of extracting early warning signals of risk from over a decade of earnings call transcripts. By combining sentiment analysis, topic modelling, evasiveness detection, and a custom Retrieval-Augmented Generation (RAG) assistant, we created a tool that transforms qualitative dialogue into structured insight—empowering regulators to spot emerging threats before they hit the headlines.
In this post, I’ll walk through how we built it, what we found, and why this matters for the future of financial supervision.
- Project Definition
- Jupyter Notebook
- Report
🔍 Overview
Team: STOMPA | Date: June 2025
Client: Bank of England – Prudential Regulation Authority (PRA)
This project aimed to support the Bank of England’s Prudential Regulation Authority in detecting early warning signals of risk within global systemically important banks (G-SIBs) by analysing unstructured text from quarterly earnings call transcripts. Traditional financial metrics often miss nuanced indicators — our solution used cutting-edge natural language processing (NLP) techniques to close that gap.
💡 Objective
To design an end-to-end pipeline that transforms financial transcripts into actionable, risk-aware insights, ultimately enhancing supervisory decision-making.
🧠 Key Components
- Sentiment & Risk Detection:
Combined rule-based heuristics, FinBERT, and Gemini to classify risk sentiment at sentence level across Presentation and Q&A sections of earnings calls. - Aspect-Based Sentiment Analysis:
Used DeBERTa-v3 ABSA to track sentiment trends across financial themes like capital adequacy, loan book quality, and interest rates. - Topic Modelling:
Applied BERTopic and GPT-4.1 to extract and summarise high-risk themes such as commercial real estate exposure and net interest margin pressure. - Question Avoidance Detection:
Leveraged Gemini-2.0-Flash to evaluate whether executives deflected analyst questions—revealing patterns in transparency and communication strategy. - Retrieval-Augmented Generation (RAG):
Built a chatbot using LangChain and vector databases to answer complex queries with regulatory context, linking Basel Framework docs and transcript data.
📈 Results
- Identified high-risk communication trends across UBS, JPMorgan, and Citibank.
- Discovered evasive Q&A patterns during periods of operational or reputational stress.
- Created a Streamlit dashboard and AI assistant that enabled non-experts to interpret financial risk data effectively.
🛠️ Tech Stack
Python
, PyMuPDF
, FinBERT
, DeBERTa
, OpenAI GPT-4
, LangChain
, Streamlit
, XGBoost
, SARIMA
, BERTopic
, Gemini
, spaCy