Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

Jiayu Li; Enpei Zhang; Dawei Zhou; Elynn Chen; Yujun Yan

arXiv:2605.19140·cs.AI·May 20, 2026

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

Jiayu Li, Enpei Zhang, Dawei Zhou, Elynn Chen, Yujun Yan

PDF

TL;DR

This paper introduces a provably convergent decentralized Q-learning algorithm for workflow learning in multi-agent systems with interface constraints, providing finite-sample guarantees and validating through diverse experiments.

Contribution

It formalizes the interface-constrained semi-Markov decision process and develops IC-Q, the first finite-sample neural Q-learning guarantee for decentralized partial observability.

Findings

01

IC-Q matches centralized oracle performance without joint trajectory access.

02

The finite-sample bound decomposes into neural approximation, interface gap, and mixing-time errors.

03

Experiments validate the theoretical error scaling and effectiveness across multiple tasks.

Abstract

We study workflow learning in a setting where specialized agents hand off control through a shared artifact, each agent observes only a local function of that artifact and its own private state, and no centralized learner accesses joint trajectories -- the operating regime of multi-agent LLM pipelines that span organizational, vendor, or trust boundaries. We formalize this regime as an interface-constrained semi-Markov decision process (IC-SMDP), whose decision epochs occur at handoff times, and design IC- $Q$ , an asynchronous decentralized $Q$ -learning algorithm in which cross-agent coordination at every handoff is exactly one scalar. Our main result is a finite-sample bound for neural IC- $Q$ that decomposes into three independently controllable error sources: neural function-approximation error, interface representation gap, and a mixing-time residual, under the random option-duration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.