TL;DR
RuPLaR introduces a single-step latent reasoning framework guided by rule-based priors, significantly improving accuracy and efficiency in LLM reasoning tasks.
Contribution
It proposes a novel one-model, one-step latent reasoning method that reduces complexity and error propagation in latent CoT reasoning.
Findings
Improves accuracy by 11.1% over existing latent CoT methods.
Achieves reasoning with minimal token usage.
Demonstrates extensibility and effectiveness through extensive experiments.
Abstract
The Chain-of-Thought (CoT) paradigm, while enhancing the interpretability of Large Language Models (LLMs), is constrained by the inefficiencies and expressive limits of natural language. Latent Chain-of-Thought (latent CoT) reasoning, which operates in a continuous latent space, offers a promising alternative but faces challenges from structural complexities in existing multi-step or multi-model paradigms, such as error propagation and coordination overhead. In this paper, we introduce One-Model One-Step, a novel compression framework for Latent Reasoning with Rule-Based Priors(RuPLaR) to address this challenge. Our method trains an LLM to autonomously generate latent reasoning tokens in a single training stage, guided by rule-based prior probability distributions, thereby eliminating cascaded processes and inter-model dependencies. To ensure reasoning quality, we design a joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
