TL;DR
The paper introduces Abstract Chain-of-Thought, a discrete latent reasoning method that reduces reasoning token count during inference while maintaining performance across various reasoning tasks.
Contribution
It proposes a novel post-training latent reasoning mechanism with a warm-up loop and reinforcement learning to generate an abstract reasoning language.
Findings
Up to 11.6× fewer reasoning tokens needed.
Achieves comparable performance on reasoning tasks.
Emergent power law distribution over abstract vocabulary.
Abstract
While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose , a discrete latent reasoning post-training mechanism in which the language model produces a short sequence of tokens from a reserved vocabulary in lieu of a natural language CoT, before generating a response. To make previously unseen ''abstract'' tokens useful, we introduce a policy iteration-style warm-up loop that alternates between (i.) bottlenecking from a verbal CoT via masking and performing supervised fine-tuning, and (ii.) self-distillation by training the model to generate abstract tokens from the prompt alone via constrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
