RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

Yuyang Dai; Yan Lin; Zhuohan Xie; Yuxia Wang

arXiv:2602.07096·q-fin.ST·April 28, 2026

RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

Yuyang Dai, Yan Lin, Zhuohan Xie, Yuxia Wang

PDF

TL;DR

This paper introduces REALFIN, a bilingual benchmark for evaluating financial reasoning in language models, emphasizing the importance of recognizing missing information and when questions are unjustified.

Contribution

The paper presents a new benchmark that systematically tests models' ability to identify missing premises and reject unjustified answers in financial reasoning tasks.

Findings

01

Models perform worse when key information is missing.

02

General-purpose models tend to over-commit and guess.

03

Most finance-specialized models struggle to identify missing premises.

Abstract

Reliable financial reasoning requires knowing not only how to answer, but also when an answer cannot be justified. In real financial practice, problems often rely on implicit assumptions that are taken for granted rather than stated explicitly, causing problems to appear solvable while lacking enough information for a definite answer. We introduce REALFIN, a bilingual benchmark that evaluates financial reasoning by systematically removing essential premises from exam-style questions while keeping them linguistically plausible. Based on this, we evaluate models under three formulations that test answering, recognizing missing information, and rejecting unjustified options, and find consistent performance drops when key conditions are absent. General-purpose models tend to over-commit and guess, while most finance-specialized models fail to clearly identify missing premises. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.