Compass: Optimizing Compound AI Workflows for Dynamic Adaptation
Milos Gravara, Juan Luis Herrera, Stefan Nastic

TL;DR
This paper introduces Compass, a framework for dynamic configuration switching in compound AI systems, optimizing accuracy, latency, and cost under varying loads through offline and online adaptation.
Contribution
Compass provides a novel approach combining offline Pareto-optimal configuration discovery with online runtime adaptation for compound AI workflows.
Findings
Achieves 100% recall in configuration discovery with 57.5% fewer evaluations.
Improves SLO compliance by 71.6% under dynamic loads.
Enhances accuracy by 3-5% over static fast baselines.
Abstract
Compound AI is a distributed intelligence approach that represents a unified system orchestrating specialized AI/ML models with engineered software components into AI workflows. Compound AI production deployments must satisfy accuracy, latency, and cost objectives under varying loads. However, many deployments operate on fixed infrastructure where horizontal scaling is not viable. Existing approaches optimize solely for accuracy and do not consider changes in workload conditions. We observe that compound AI systems can switch between configurations to fit infrastructure capacity, trading accuracy for latency based on current load. This requires discovering multiple Pareto-optimal configurations from a combinatorial search space and determining when to switch between them at runtime. We present Compass, a novel framework that enables dynamic configuration switching through offline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software System Performance and Reliability · Advanced Software Engineering Methodologies
