TL;DR
FlowCompile is a compiler that optimizes structured LLM workflows at compile time, creating a reusable set of configurations to balance accuracy and latency across diverse tasks.
Contribution
It introduces a novel compile-time approach for global workflow optimization, outperforming existing routing-based methods without retraining.
Findings
FlowCompile achieves up to 6.4x speedup over baselines.
It constructs a reusable configuration set for flexible deployment.
The method outperforms heuristics and routing-based baselines across benchmarks.
Abstract
Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
