Aligning Compound AI Systems via System-level DPO
Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding, Katherine Tsai, Haolun Wu, Sanmi Koyejo

TL;DR
This paper introduces SysDPO, a novel framework for aligning complex compound AI systems with human preferences by modeling them as DAGs and extending preference optimization techniques, addressing challenges of non-differentiability and system-level preference translation.
Contribution
The paper formulates compound AI systems as DAGs and develops SysDPO, a new method extending DPO for joint system-level alignment, with two variants for different data scenarios.
Findings
Effective joint alignment of language and diffusion models.
Successful alignment of LLM collaboration systems.
Demonstrated improvements in real-world applications.
Abstract
Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · AI-based Problem Solving and Planning · Advanced Database Systems and Queries
MethodsDirect Preference Optimization · Diffusion · ALIGN
