ValueBlindBench: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable

Sidi Chang; Peiying Zhu; Yuxiao Chen

arXiv:2604.25224·cs.AI·May 5, 2026

ValueBlindBench: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable

Sidi Chang, Peiying Zhu, Yuxiao Chen

PDF

TL;DR

ValueBlindBench is a protocol that uses agreement gating to evaluate LLM-generated investment rationales before actual returns are observable, addressing delayed-truth evaluation challenges in AI-finance.

Contribution

It introduces a novel agreement-gated stress-test protocol for pre-deployment evaluation of LLM judges in finance, ensuring reliability and robustness of investment rationale claims.

Findings

01

ValueBlindBench clears the agreement gate at 0.7168 but prevents overclaims.

02

Lower-rank systems tend to collapse into a tie-class.

03

Financial constructs like constraint awareness are operationally load-bearing.

Abstract

LLM-based financial agents increasingly produce investment rationales before the outcomes needed to evaluate them are observable. This creates a delayed-ground-truth evaluation problem: realized returns remain the eventual arbiter of investment quality, but they arrive too late and are too noisy to guide many model-development and governance decisions. LLM judges offer a tempting shortcut for pre-deployment evaluation of AI-finance systems, but unvalidated judges may reward verbosity, confidence, or rubric mimicry rather than financial judgment. This paper introduces ValueBlindBench, a preregistered agreement-gated stress-test protocol for deciding when LLM-judged investment-rationale claims are publishable, qualified, or invalid. In a controlled market-state capital-allocation prototype with 1,000 honest decision cycles and 100 preregistered adversarial controls (1,100 trajectories,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.