Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate

John Seon Keun Yi; Aaron Mueller; Dokyun Lee

arXiv:2604.24881·cs.AI·April 29, 2026

Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate

John Seon Keun Yi, Aaron Mueller, Dokyun Lee

PDF

1 Repo

TL;DR

This paper introduces a two-stage fine-tuning method to internalize multi-agent debate in large language models, significantly reducing computational costs while maintaining or improving reasoning performance.

Contribution

It presents a novel framework for distilling multi-agent debate into a single LLM, enabling efficient internalized reasoning and interpretability through activation steering.

Findings

01

Internalized models match or outperform explicit debate with 93% fewer tokens.

02

Activation analysis reveals agent-specific subspaces corresponding to different perspectives.

03

Distillation facilitates easier localization and control of harmful behaviors.

Abstract

Multi-agent debate has been shown to improve reasoning in large language models (LLMs). However, it is compute-intensive, requiring generation of long transcripts before answering questions. To address this inefficiency, we develop a framework that distills multi-agent debate into a single LLM through a two-stage fine-tuning pipeline combining debate structure learning with internalization via dynamic reward scheduling and length clipping. Across multiple models and benchmarks, our internalized models match or exceed explicit multi-agent debate performance using up to 93% fewer tokens. We then investigate the mechanistic basis of this capability through activation steering, finding that internalization creates agent-specific subspaces: interpretable directions in activation space corresponding to different agent perspectives. We further demonstrate a practical application: by instilling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

johnsk95/latent_agents
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.