DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu

TL;DR
DOTS introduces a dynamic reasoning approach for LLMs that searches for optimal reasoning trajectories tailored to each question, significantly improving reasoning performance over static methods.
Contribution
The paper proposes a novel method for dynamic reasoning in LLMs by searching and training on optimal reasoning trajectories specific to each question and task capability.
Findings
Outperforms static reasoning techniques and vanilla instruction tuning.
Enables LLMs to adapt reasoning depth based on problem complexity.
Consistently improves performance across eight reasoning tasks.
Abstract
Enhancing the capability of large language models (LLMs) in reasoning has gained significant attention in recent years. Previous studies have demonstrated the effectiveness of various prompting strategies in aiding LLMs in reasoning (called "reasoning actions"), such as step-by-step thinking, reflecting before answering, solving with programs, and their combinations. However, these approaches often applied static, predefined reasoning actions uniformly to all questions, without considering the specific characteristics of each question or the capability of the task-solving LLM. In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. Our approach involves three key steps: i) defining atomic reasoning action modules…
Peer Reviews
Decision·ICLR 2025 Poster
1. The authors propose a dynamic reasoning method which can enable the model to decide the appropriate atomic actions based on the characteristics of the input question. 2. The authors conduct comprehensive experiments to prove the effectiveness of the proposed method, containing in distribution, few-shot, and out-of-distribution settings. 3. The proposed method can be used on both open-source and close-source models.
1. The method proposed in this paper does not show significant improvement on out-of-distribution (OOD) tasks, and it incurs additional computational overhead compared to prompt engineering methods. 2. The baseline for Vanilla SFT only used the training data from CoT. I believe it should be compared with baselines from other reasoning formats to demonstrate the effectiveness of the proposed method, such as using the program reasoning format and mixed training data from CoT and Program. 3. The ex
1. Adaptability of the method: The method can be used either on the planner or on the task solver. 2. The method shows lower cost than non-CoT methods. 3. Improved Reasoning ability: The paper demonstrates improved reasoning ability by allowing for dynamic selection for a given question, outperforming static methods as well as self refinement (most but not all scenarios).
1. The proposed method is complex and involves several steps. It is not clear if the complexity is warranted and ablation studies on the various aspects can help. 2. The paper does not explore decomposition or tool use which would be crucial for complex tasks. 3. The paper glosses over how the atomic trajectories are collected.
DOTS unifies many of recent LLM prompting reasoning methods, and can dynamically choose better method for each module for each data sample. It borrows the strength of various solutions in literature, and can possibly incorporate future works. The paper contains comprehensive experimental results, on various datasets across domain, and in/near/out-of-distribution experiments. The experiments show superior performance of DOT to previous prompting methods consistently. The paper also includes abl
Although DOTS has higher average performance, DOTS cannot consistently beat baselines in all tasks. This is a little surprising as DOTS should be able to be considered as superset of all baselines. On those dataset DOTS fall behind baselines, it would mean it's not necessary to do dynamical reasoning (as one can choose the baseline instead of choosing different modules for each data sample). This phenomenon is more often in out-of-distribution settings. The average score on out-of-distribution
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Artificial Intelligence in Law
MethodsSoftmax · Attention Is All You Need
