Finance Language Model Evaluation (FLaME)

Glenn Matlin; Mika Okamoto; Huzaifa Pardawala; Yang Yang; Sudheer Chava

arXiv:2506.15846·cs.CL·June 23, 2025

Finance Language Model Evaluation (FLaME)

Glenn Matlin, Mika Okamoto, Huzaifa Pardawala, Yang Yang, Sudheer Chava

PDF

Open Access

TL;DR

This paper introduces FLaME, a comprehensive benchmarking suite for evaluating financial language models, addressing gaps in existing frameworks and demonstrating the potential of LMs in finance through extensive empirical analysis.

Contribution

It presents the first holistic evaluation framework for FinNLP, including a comparative study of 23 foundation LMs on 20 finance-specific NLP tasks.

Findings

01

LMs show promising performance on finance NLP tasks

02

Reasoning-reinforced LMs outperform standard models

03

Open-source framework enables reproducibility and further research

Abstract

Language Models (LMs) have demonstrated impressive capabilities with core Natural Language Processing (NLP) tasks. The effectiveness of LMs for highly specialized knowledge-intensive tasks in finance remains difficult to assess due to major gaps in the methodologies of existing evaluation frameworks, which have caused an erroneous belief in a far lower bound of LMs' performance on common Finance NLP (FinNLP) tasks. To demonstrate the potential of LMs for these FinNLP tasks, we present the first holistic benchmarking suite for Financial Language Model Evaluation (FLaME). We are the first research paper to comprehensively study LMs against 'reasoning-reinforced' LMs, with an empirical study of 23 foundation LMs over 20 core NLP tasks in finance. We open-source our framework software along with all data and results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Explainable Artificial Intelligence (XAI) · Topic Modeling