TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
Haoyang He, Zihua Rong, Liangjie Zhao, Yunjia Zhao, Lan Yang, Honggang Zhang

TL;DR
TTSR introduces a self-reflective, test-time training framework where a language model iteratively identifies and addresses its reasoning weaknesses through self-generated questions, enhancing reasoning abilities during testing.
Contribution
The paper presents TTSR, a novel test-time self-evolving training method that enables models to self-reflect and improve reasoning by alternating roles of student and teacher within a continual loop.
Findings
Consistently improves mathematical reasoning performance.
Generalizes across different model architectures.
Effective in challenging reasoning benchmarks.
Abstract
Test-time Training enables model adaptation using only test questions and offers a promising paradigm for improving the reasoning ability of large language models (LLMs). However, it faces two major challenges: test questions are often highly difficult, making self-generated pseudo-labels unreliable, and existing methods lack effective mechanisms to adapt to a model's specific reasoning weaknesses, leading to inefficient learning. To address these issues, we propose \textbf{TTSR}, a self-reflective test-time self-evolving training framework. TTSR employs a single pretrained language model that alternates between the roles of a \textit{Student} and a \textit{Teacher} at test time. The Student focuses on solving problems and learning from synthesized variant questions, while the Teacher analyzes the Student's failed reasoning trajectories, summarizes recurring reasoning weaknesses, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning
