Scaling Unverifiable Rewards: A Case Study on Visual Insights

Shuyu Gan; James Mooney; Pan Hao; Renxiang Wang; Mingyi Hong; Qianwen Wang; Dongyeop Kang

arXiv:2512.22650·cs.LG·December 30, 2025

Scaling Unverifiable Rewards: A Case Study on Visual Insights

Shuyu Gan, James Mooney, Pan Hao, Renxiang Wang, Mingyi Hong, Qianwen Wang, Dongyeop Kang

PDF

Open Access

TL;DR

This paper introduces Selective TTS, a process-based refinement framework for multi-stage LLM pipelines that improves output quality and stability when final rewards are unverifiable, demonstrated through visual data insights.

Contribution

We propose Selective TTS, a novel multi-stage refinement method that distributes compute and prunes low-quality branches to stabilize and enhance LLM-based insights.

Findings

01

Increased insight quality scores from 61.64 to 65.86.

02

Reduced variance in output quality.

03

Aligned judge model with human experts (Kendall's τ=0.55).

Abstract

Large Language Model (LLM) agents can increasingly automate complex reasoning through Test-Time Scaling (TTS), iterative refinement guided by reward signals. However, many real-world tasks involve multi-stage pipeline whose final outcomes lack verifiable rewards or sufficient data to train robust reward models, making judge-based refinement prone to accumulate error over stages. We propose Selective TTS, a process-based refinement framework that scales inference across different stages in multi-agent pipeline, instead of repeated refinement over time by prior work. By distributing compute across stages and pruning low-quality branches early using process-specific judges, Selective TTS mitigates the judge drift and stabilizes refinement. Grounded in the data science pipeline, we build an end-to-end multi-agent pipeline for generating visually insightful charts and report of given…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling