Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling

Shiyu Ji; Yixuan Wang; Yijun Liu; Qingfu Zhu; Wanxiang Che

arXiv:2511.09345·cs.CL·January 22, 2026

Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling

Shiyu Ji, Yixuan Wang, Yijun Liu, Qingfu Zhu, Wanxiang Che

PDF

Open Access

TL;DR

SeerSC introduces a dynamic self-consistency framework that leverages rapid System 1 reasoning to reduce token usage and latency in LLM inference, outperforming existing methods with significant efficiency gains.

Contribution

The paper presents SeerSC, a novel approach combining System 1 and System 2 reasoning to enhance token efficiency and reduce latency during test-time scaling of LLMs.

Findings

01

Up to 47% reduction in token consumption

02

Up to 43% reduction in inference latency

03

Maintains performance while improving efficiency

Abstract

Test-time scaling improves the inference performance of Large Language Models (LLMs) but also incurs substantial computational costs. Although recent studies have reduced token consumption through dynamic self-consistency, they remain constrained by the high latency of sequential requests. In this paper, we propose SeerSC, a dynamic self-consistency framework that simultaneously improves token efficiency and latency by integrating System 1 and System 2 reasoning. Specifically, we utilize the rapid System 1 to compute the answer entropy for given queries. This score is then used to evaluate the potential of samples for scaling, enabling dynamic self-consistency under System 2. Benefiting from the advance and accurate estimation provided by System 1, the proposed method can reduce token usage while simultaneously achieving a significant decrease in latency through parallel generation. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms