Rethinking Thinking Tokens: Understanding Why They Underperform in   Practice

Sreeram Vennam; David Valente; David Herel; Ponnurangam Kumaraguru

arXiv:2411.11371·cs.CL·November 19, 2024

Rethinking Thinking Tokens: Understanding Why They Underperform in Practice

Sreeram Vennam, David Valente, David Herel, Ponnurangam Kumaraguru

PDF

Open Access

TL;DR

This paper investigates why Thinking Tokens underperform compared to Chain-of-Thought reasoning in language models, attributing it to issues with embedding consistency and noisy gradients, and provides empirical analysis to support this.

Contribution

It offers a detailed empirical analysis explaining the underperformance of Thinking Tokens and discusses implications for future unsupervised reasoning methods in LLMs.

Findings

01

Thinking Tokens marginally improve performance but underperform CoT.

02

Single embedding reliance causes inconsistent learning signals.

03

Noisy gradients hinder effective reasoning in TTs.

Abstract

Thinking Tokens (TT) have been proposed as an unsupervised method to facilitate reasoning in language models. However, despite their conceptual appeal, our findings show that TTs marginally improves performance and consistently underperforms compared to Chain-of-Thought (CoT) reasoning across multiple benchmarks. We hypothesize that this underperformance stems from the reliance on a single embedding for TTs, which results in inconsistent learning signals and introduces noisy gradients. This paper provides a comprehensive empirical analysis to validate this hypothesis and discusses the implications for future research on unsupervised reasoning in LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Education and Critical Thinking Development