Aligning Compound AI Systems via System-level DPO

Xiangwen Wang; Yibo Jacky Zhang; Zhoujie Ding; Katherine Tsai; Haolun Wu; Sanmi Koyejo

arXiv:2502.17721·cs.LG·March 9, 2026

Aligning Compound AI Systems via System-level DPO

Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding, Katherine Tsai, Haolun Wu, Sanmi Koyejo

PDF

Open Access 1 Video

TL;DR

This paper introduces SysDPO, a novel framework for aligning complex compound AI systems with human preferences by modeling them as DAGs and extending preference optimization techniques, addressing challenges of non-differentiability and system-level preference translation.

Contribution

The paper formulates compound AI systems as DAGs and develops SysDPO, a new method extending DPO for joint system-level alignment, with two variants for different data scenarios.

Findings

01

Effective joint alignment of language and diffusion models.

02

Successful alignment of LLM collaboration systems.

03

Demonstrated improvements in real-world applications.

Abstract

Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Aligning Compound AI Systems via System-level DPO· slideslive

Taxonomy

TopicsNeural Networks and Applications · AI-based Problem Solving and Planning · Advanced Database Systems and Queries

MethodsDirect Preference Optimization · Diffusion · ALIGN