Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints
Jiayu Li, Enpei Zhang, Dawei Zhou, Elynn Chen, Yujun Yan

TL;DR
This paper introduces a provably convergent decentralized Q-learning algorithm for workflow learning in multi-agent systems with interface constraints, providing finite-sample guarantees and validating through diverse experiments.
Contribution
It formalizes the interface-constrained semi-Markov decision process and develops IC-Q, the first finite-sample neural Q-learning guarantee for decentralized partial observability.
Findings
IC-Q matches centralized oracle performance without joint trajectory access.
The finite-sample bound decomposes into neural approximation, interface gap, and mixing-time errors.
Experiments validate the theoretical error scaling and effectiveness across multiple tasks.
Abstract
We study workflow learning in a setting where specialized agents hand off control through a shared artifact, each agent observes only a local function of that artifact and its own private state, and no centralized learner accesses joint trajectories -- the operating regime of multi-agent LLM pipelines that span organizational, vendor, or trust boundaries. We formalize this regime as an interface-constrained semi-Markov decision process (IC-SMDP), whose decision epochs occur at handoff times, and design IC-, an asynchronous decentralized -learning algorithm in which cross-agent coordination at every handoff is exactly one scalar. Our main result is a finite-sample bound for neural IC- that decomposes into three independently controllable error sources: neural function-approximation error, interface representation gap, and a mixing-time residual, under the random option-duration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
