TL;DR
Fin-PRM is a specialized process reward model designed for financial reasoning in large language models, improving intermediate step verification and overall reasoning accuracy.
Contribution
It introduces a domain-specific, trajectory-aware reward model trained on a high-quality financial reasoning dataset, enhancing reasoning tasks in finance.
Findings
Fin-PRM outperforms general PRMs on financial benchmarks.
It improves offline trajectory selection and test-time inference.
The model effectively integrates step- and trajectory-level rewards.
Abstract
Process Reward Models (PRMs) supervise intermediate reasoning steps in large language models (LLMs), but existing PRMs are mainly trained on general-domain data and struggle with the structured, symbolic, and fact-sensitive nature of financial reasoning. Financial tasks require not only correct final answers but also verifiable intermediate steps grounded in domain knowledge. In this paper, we propose Fin-PRM, a domain-specialized, trajectory-aware PRM for financial reasoning that jointly models step-level correctness and trajectory-level coherence, producing binary supervision signals for both local and global reasoning quality. To support reliable supervision, we construct a high-quality financial reasoning dataset of 3K trajectories, where step- and trajectory-level labels are automatically derived from multi-source reward signals, including Monte Carlo rollouts, LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
