Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

Xuying LI

arXiv:2510.22132·cs.AI·October 28, 2025

Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

Xuying LI

PDF

TL;DR

This paper introduces a method for controllable mathematical reasoning in large language models using self-optimizing thought vectors that minimize entropy, enabling focused and guided reasoning without external rewards.

Contribution

The paper proposes learnable thought vectors with entropy minimization to control reasoning in language models, demonstrating improved accuracy and interpretability.

Findings

01

Achieved 90.1% accuracy on GSM8K with Gemma-2-9B.

02

Thought vectors form distinct clusters correlating with reasoning patterns.

03

Entropy-based rewards guide focused reasoning without external annotations.

Abstract

We present a novel approach for controllable mathematical reasoning that leverages self-optimizing thought vectors with entropy minimization. Our method introduces learnable thought vectors that dynamically modulate the internal reasoning process of large language models. Using Gemma-2-9B on GSM8K, we achieve 90.1% accuracy with a controllability score of 0.42, demonstrating that entropy-based rewards effectively guide focused reasoning patterns without requiring external reward annotations. Our analysis reveals distinct thought vector clusters and consistent low-entropy distributions across control conditions, validating our framework for controllable AI reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.