Financial Instruction Following Evaluation (FIFE)

Glenn Matlin; Siddharth; Anirudh JM; Aditya Shukla; Yahya Hassan; Sudheer Chava

arXiv:2512.08965·cs.LG·December 11, 2025

Financial Instruction Following Evaluation (FIFE)

Glenn Matlin, Siddharth, Anirudh JM, Aditya Shukla, Yahya Hassan, Sudheer Chava

PDF

Open Access

TL;DR

FIFE is a new benchmark designed to evaluate language models' ability to follow complex financial instructions, revealing significant gaps in current models' performance and emphasizing the need for improved reinforcement learning methods.

Contribution

We introduce FIFE, a challenging, high-difficulty benchmark with a verification system for financial instruction-following, and evaluate diverse models to highlight current limitations.

Findings

01

Top open-weight models outperform proprietary ones in strict and loose settings.

02

All models struggle to fully comply with FIFE's complex instructions.

03

Open-source models lag behind proprietary models in performance.

Abstract

Language Models (LMs) struggle with complex, interdependent instructions, particularly in high-stakes domains like finance where precision is critical. We introduce FIFE, a novel, high-difficulty benchmark designed to assess LM instruction-following capabilities for financial analysis tasks. FIFE comprises 88 human-authored prompts and employs a verification system with chainable, verifiable constraints for fine-grained reward signals. We evaluate 53 models (proprietary, open-weight, open-source) in a zero-shot setting. Our key findings reveal a clear performance hierarchy: the top open-weight model (76.1 strict / 79.5 loose) surpasses the leading proprietary system (65.9 strict / 70.5 loose), while the best open-source models lag significantly (45.5 strict / 48.9 loose). However, even top-performing models struggle with FIFE's complex requirements, failing to achieve perfect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Stock Market Forecasting Methods