Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors
Xuying LI

TL;DR
This paper introduces a method for controllable mathematical reasoning in large language models using self-optimizing thought vectors that minimize entropy, enabling focused and guided reasoning without external rewards.
Contribution
The paper proposes learnable thought vectors with entropy minimization to control reasoning in language models, demonstrating improved accuracy and interpretability.
Findings
Achieved 90.1% accuracy on GSM8K with Gemma-2-9B.
Thought vectors form distinct clusters correlating with reasoning patterns.
Entropy-based rewards guide focused reasoning without external annotations.
Abstract
We present a novel approach for controllable mathematical reasoning that leverages self-optimizing thought vectors with entropy minimization. Our method introduces learnable thought vectors that dynamically modulate the internal reasoning process of large language models. Using Gemma-2-9B on GSM8K, we achieve 90.1% accuracy with a controllability score of 0.42, demonstrating that entropy-based rewards effectively guide focused reasoning patterns without requiring external reward annotations. Our analysis reveals distinct thought vector clusters and consistent low-entropy distributions across control conditions, validating our framework for controllable AI reasoning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
