DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning
Mohammad Mahdi Moradi, Sudhir Mudur

TL;DR
DiSCTT introduces a dynamic, consensus-guided self-curriculum for test-time adaptation in reasoning models, improving accuracy and efficiency by tailoring optimization strategies based on instance difficulty and uncertainty.
Contribution
It presents a novel framework that adaptively applies different optimization methods based on instance-level uncertainty, enhancing test-time reasoning performance.
Findings
Outperforms existing test-time adaptation methods across reasoning benchmarks.
Achieves higher accuracy with lower computational costs.
Reduces variance in model performance.
Abstract
Test-time adaptation offers a promising avenue for improving reasoning performance in large language models without additional supervision, but existing approaches often apply a uniform optimization objective across all inputs, leading to inefficient or unstable adaptation on heterogeneous reasoning problems. We propose DiSCTT, a difficulty-aware, consensus-guided self-curriculum framework that dynamically allocates test-time optimization strategies based on instance-level epistemic uncertainty estimated from agreement among sampled reasoning trajectories. Inputs with high consensus are consolidated via supervised fine-tuning using majority-agreed solutions as pseudo-labels, while low-consensus inputs are optimized via reinforcement learning with a consensus-regularized objective that encourages diversity under relevance constraints. Across a broad suite of mathematical and general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
