Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis

Md Talha Mohsin

arXiv:2507.22936·cs.CL·January 21, 2026

Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis

Md Talha Mohsin

PDF

Open Access

TL;DR

This study systematically evaluates five transformer-based large language models on financial report analysis, revealing significant variability in their performance, behavior, and reliability, emphasizing the need for comprehensive evaluation frameworks in high-stakes financial NLP tasks.

Contribution

It provides a controlled, multi-faceted evaluation of LLMs in financial NLP, highlighting behavioral differences and the importance of nuanced assessment methods.

Findings

01

Models differ in relevance, accuracy, and clarity.

02

Automated metrics show systematic lexical and semantic differences.

03

Response stability varies across models and prompts.

Abstract

Large language models (LLMs) are increasingly used to support the analysis of complex financial disclosures, yet their reliability, behavioral consistency, and transparency remain insufficiently understood in high-stakes settings. This paper presents a controlled evaluation of five transformer-based LLMs applied to question answering over the Business sections of U.S. 10-K filings. To capture complementary aspects of model behavior, we combine human evaluation, automated similarity metrics, and behavioral diagnostics under standardized and context-controlled prompting conditions. Human assessments indicate that models differ in their average performance across qualitative dimensions such as relevance, completeness, clarity, conciseness, and factual accuracy, though inter-rater agreement is modest, reflecting the subjective nature of these criteria. Automated metrics reveal systematic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods