ATLAS: A Multi-LLM Training Framework for EvoDPO with Adaptive Reference Evolution
Ujin Jeon, Jiyong Kwon, Madison Ann Sullivan, Caleb Eunho Lee, Guang Lin

TL;DR
ATLAS introduces an adaptive multi-agent training framework utilizing EvoDPO for continuous policy refinement, overcoming static reference limitations in multi-LLM systems across diverse complex tasks.
Contribution
It presents ATLAS, a novel multi-agent framework with adaptive reference updates via EvoDPO, enabling self-evolution and improved performance in challenging environments.
Findings
Outperforms fixed-reference and external baselines in various tasks.
Adaptive reference updates enhance long-term self-improvement.
Framework effectively balances exploration and stability.
Abstract
Recent multi-LLM agent systems have shown promising capabilities for automated problem-solving, yet they predominantly rely on frozen agents or static fine-tuning pipelines. To address this limitation, our primary contribution is ATLAS (Adaptive Task-distributed Learning for Agentic Self-evolution), a multi-agent framework where specialized meta-agents collaboratively train and refine an active agent toward a domain-specific policy. A core challenge in iterative preference learning within these pipelines is the reliance on fixed reference models, which typically leads to overly conservative updates or training stagnation. To overcome this, the framework's algorithmic engine utilizes Evolving Direct Preference Optimization (EvoDPO). EvoDPO employs an inspection agent to perform adaptive, proxy-KL gated reference policy updates based on continuous training telemetry. We evaluate this full…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
