Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters

Danqing Wang; Zhuorui Ye; Xinran Zhao; Fei Fang; Lei Li

arXiv:2505.14886·cs.CL·December 22, 2025

Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters

Danqing Wang, Zhuorui Ye, Xinran Zhao, Fei Fang, Lei Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces TreeDebater, a novel debate framework using tree structures to improve strategic reasoning and argumentation in competitive debates, leading to better performance than existing systems.

Contribution

The paper presents TreeDebater, a new debate system with Rehearsal and Debate Flow Trees that enhance strategic planning and argument evaluation in AI debate models.

Findings

01

TreeDebater outperforms state-of-the-art debate systems in persuasiveness.

02

Achieves +15.6% in stage-level persuasiveness and +10% in debate-level opinion shift.

03

Demonstrates better time management strategies similar to human experts.

Abstract

Winning competitive debates requires sophisticated reasoning and argument skills. There are unique challenges in the competitive debate: (1) The time constraints force debaters to make strategic choices about which points to pursue rather than covering all possible arguments; (2) The persuasiveness of the debate relies on the back-and-forth interaction between arguments, which a single final game status cannot evaluate. To address these challenges, we propose TreeDebater, a novel debate framework that excels in competitive debate. We introduce two tree structures: the Rehearsal Tree and Debate Flow Tree. The Rehearsal Tree anticipates the attack and defenses to evaluate the strength of the claim, while the Debate Flow Tree tracks the debate status to identify the active actions. TreeDebater allocates its time budget among candidate actions and uses the speech time controller and…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper tackles competitive debate where persuasiveness determines success, and it operationalizes time budgeting and action selection under evolving interaction. The two‑tree design mirrors how human debaters prepare and flow debates. 2. The Rehearsal Tree formalizes pre‑debate planning with a k‑step strength score that blends support and attack impacts. 3. The Debate Flow Tree offers a practical, auditable representation for candidate action extraction during the match. 4. Good empirical

Weaknesses

1. TreeDebater alone uses an iterative time controller; the baseline is given only “rough word budgets” and is then audio‑trimmed, which can truncate arguments mid‑point and plausibly depress persuasiveness. 2. Missing detailed ablations for each component of the TreeDebater in achieving the final persuasiveness. For example, the impact of Rehearsal Tree and Debate Flow Tree separately is unknown. 3. While the overall pipeline is well designed, the fundamental idea shares similarity with Tree‑o

Reviewer 02Rating 4Confidence 3

Strengths

1. The tree-based methods are novel and interesting. In particular, it has a nice approach of estimating the argument strength not just in terms of the argument itself but in terms of anticipating its full impact including the estimated opponent actions that will follow. 2. Human evaluation was done at the level of a specific stage as well as the overall debate impact, and shows some persuasiveness gains from the proposed approach.

Weaknesses

1. There are many details here and no provided code, so reproducibility is a major issue. Moreover, given that the paper is focused on a comparison to Agent4Debate, I think it is missing a clearer accounting of how and in what architecture the method here integrates with the agent system described there. The diagram in Figure 1 is helpful but is quite vague in terms of understanding how the LLM agents are used in practice when generating the debate. 2. I felt that the part about simulated audien

Reviewer 03Rating 4Confidence 4

Strengths

The suggested system is inspired by the way humans prepare to and conduct a competitive debate. At a high level the system’s principles and architecture look reasonable but details are missing (see below). Evaluation includes both stage-level and end-to-end human preference experiments as well as additional fine-grained analysis of the debates that the system is producing. Experimental results are convincing.

Weaknesses

Many details are missing. For example: - It is not clear what is the action selection criteria in the paragraph starting in line 227: “Extract Candidate Actions from Debate Flow Tree”. - When updating the Debate Flow Tree (alg. 2 in Appendix B), how is the ‘action’ being determined? - Even after reading Appendix D, it is not clear to me how the audience feedback works.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsERP Systems Implementation and Impact · Outsourcing and Supply Chain Management · Innovation and Knowledge Management