ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning

Chu Zhao; Enneng Yang; Yuting Liu; Jianzhe Zhao; Guibing Guo

arXiv:2602.02150·cs.LG·February 3, 2026

ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning

Chu Zhao, Enneng Yang, Yuting Liu, Jianzhe Zhao, Guibing Guo

PDF

Open Access

TL;DR

ECHO introduces an adaptive hybrid optimization method for test-time reinforcement learning that effectively manages exploration and reduces collapse by leveraging entropy and confidence measures, leading to improved performance and robustness.

Contribution

The paper proposes ECHO, a novel approach that adaptively controls rollout branching and pruning using entropy and confidence, addressing key challenges in test-time reinforcement learning.

Findings

01

ECHO outperforms existing methods on multiple reasoning benchmarks.

02

ECHO demonstrates better generalization with limited rollout budgets.

03

ECHO enhances training robustness by mitigating early-stage bias.

Abstract

Test-time reinforcement learning generates multiple candidate answers via repeated rollouts and performs online updates using pseudo-labels constructed by majority voting. To reduce overhead and improve exploration, prior work introduces tree structured rollouts, which share reasoning prefixes and branch at key nodes to improve sampling efficiency. However, this paradigm still faces two challenges: (1) high entropy branching can trigger rollout collapse, where the branching budget concentrates on a few trajectories with consecutive high-entropy segments, rapidly reducing the number of effective branches; (2) early pseudo-labels are noisy and biased, which can induce self-reinforcing overfitting, causing the policy to sharpen prematurely and suppress exploration. To address these issues, we propose Entropy Confidence Hybrid Group Relative Policy Optimization (ECHO). During rollout, ECHO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications