FinForge: Semi-Synthetic Financial Benchmark Generation

Glenn Matlin; Akhil Theerthala; Anant Gupta; Anirudh JM; Rayan Castilla; Yi Mei Ng; Sudheer Chava

arXiv:2601.06747·cs.AI·January 21, 2026

FinForge: Semi-Synthetic Financial Benchmark Generation

Glenn Matlin, Akhil Theerthala, Anant Gupta, Anirudh JM, Rayan Castilla, Yi Mei Ng, Sudheer Chava

PDF

Open Access

TL;DR

FinForge is a semi-synthetic benchmark generation pipeline that creates high-quality, finance-specific evaluation datasets to assess and improve language models' financial reasoning capabilities.

Contribution

We introduce FinForge, a novel hybrid pipeline combining expert curation and LM synthesis to produce large, validated finance benchmarks for evaluating language models.

Findings

01

Models show significant variation in financial reasoning accuracy.

02

Leading models achieve near 80% accuracy on the benchmark.

03

FinForge effectively diagnoses current model limitations.

Abstract

Evaluating Language Models (LMs) in specialized, high-stakes domains such as finance remains a significant challenge due to the scarcity of open, high-quality, and domain-specific datasets. Existing general-purpose benchmarks provide broad coverage but lack the depth and domain fidelity needed to assess LMs' capabilities for real-world financial reasoning, which requires both conceptual understanding and quantitative rigor. To address this gap, we introduce FinForge, a scalable, semi-synthetic pipeline for constructing finance-specific evaluation benchmarks through a hybrid of expert-guided data curation and controlled LM-based synthesis. FinForge combines manual and programmatic corpus construction from authoritative financial sources with structured question generation and validation using Gemini 2.5 Flash. To demonstrate the pipeline's efficacy, we produce FinForge-5k, a snapshot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Stock Market Forecasting Methods · FinTech, Crowdfunding, Digital Finance