Supervised Chain of Thought
Xiang Zhang, Dujian Ding

TL;DR
This paper analyzes how task-specific supervision improves the reasoning capabilities of Large Language Models by addressing limitations of the one-prompt-for-all approach in Chain of Thought prompting.
Contribution
It demonstrates theoretically and empirically that task-specific supervision enhances LLM reasoning by better navigating prompt and answer spaces, overcoming limitations of generic prompts.
Findings
Supervision improves reasoning accuracy in LLMs.
Task-specific prompts outperform generic prompts.
Theoretical analysis links supervision to enhanced computability.
Abstract
Large Language Models (LLMs) have revolutionized natural language processing and hold immense potential for advancing Artificial Intelligence. However, the core architecture of most mainstream LLMs -- the Transformer -- has inherent limitations in computational depth, rendering them theoretically incapable of solving many reasoning tasks that demand increasingly deep computations. Chain of Thought (CoT) prompting has emerged as a technique to address these architectural limitations, as evidenced by several theoretical studies. It offers a promising approach to solving complex reasoning tasks that were previously beyond the capabilities of these models. Despite its successes, CoT and its variants (such as Tree of Thought, Graph of Thought, etc.) rely on a "one-prompt-for-all" approach, using a single prompt structure (e.g., "think step by step") for a wide range of tasks -- from counting…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The paper effectively addresses the limitations of Transformer-based LLMs in handling complex reasoning tasks, particularly due to computational depth constraints, making a strong case for exploring alternatives like CoT prompting. By analyzing Transformer limitations and explaining how CoT extends reasoning capabilities, the paper provides a robust theoretical foundation, linking CoT’s potential to overcoming computational constraints. Introducing task-specific “supervised” CoT as an alternat
While the theoretical foundation of CoT and supervised prompting is well-explained, there is little mention of extensive empirical results to support the effectiveness of the proposed supervised CoT approach. Without comprehensive experiments, the practical impact and robustness of the approach may be less convincing. The introduction is densely packed with technical details and may be challenging for readers unfamiliar with computational depth limitations and CoT theory. Simplifying or clarify
The authors address an important shortcoming of Transformers: unlike recurrent networks, they are not able to perform reasoning over an arbitrary number of sequential steps (depth). Since the number of sequential steps in transformers is fixed and limited by the number of layers, CoT provides a discretized approach to adding a hidden state in autoregressive transformers at each step. The authors then formulate CoT reasoning into 'prompt space' and 'answer space' and demonstrate the complexity of
* **Unclear Focus on Discovery vs. Throughput**: While the paper's motivation is reasonable, it is unclear whether the authors aim to increase the chance of discovering the correct solution or the model's throughput (e.g., number of output tokens) by selecting the right prompt from the prompt space. In some examples, such as BFS versus DFS, different path lengths (number of output tokens) result in correct answers, but this is not clearly addressed. * **Questionable Claim on Heuristic-Driven Tem
1. The paper is well-structured, with clear explanations of core concepts such as prompt space and answer space, and the step-by-step breakdown of task examples provides a clear understanding of the benefits of CoT. 2. While CoT itself is an established technique, this work takes a novel approach by critically analyzing the limitations of traditional, unsupervised CoT.
1. Although this paper discusses the importance of supervised CoT, this supervision is in the form of case-by-case human feedback. This paper does not draw a conclusion on how to make supervision based on the proposed theory. While the paper demonstrates that task-specific supervision improves reasoning, it does not discuss the feasibility or cost of providing this supervision at scale. Implementing supervised CoT across various applications would likely demand substantial domain expertise and
- The authors provide a comprehensive summary of previous works which aim to understand the mechanism of CoT reasoning - The authors offer an interesting perspective on the role of CoT prompts
- **The conclusions are already well-established**: It is widely recognized that LLM task performance varies based on the given prompt, a fact that has driven the growth of prompt engineering as a specialized field. Similarly, it is not surprising that LLMs can demonstrate improved reasoning abilities when guided by prompts with structured reasoning steps. - **Theoretical analysis offers limited insights and could benefit from further rigor**: The theoretical analysis in Section 3 provides an i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
MethodsDense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Adam · Linear Layer · Softmax · Multi-Head Attention · Dropout
