Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach

Manav Vora; Ilan Shomorony; Melkior Ornik

arXiv:2410.21249·cs.LG·October 8, 2025

Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach

Manav Vora, Ilan Shomorony, Melkior Ornik

PDF

TL;DR

This paper introduces a scalable meta-RL method for capacity-aware planning in large multi-agent MDPs with budget constraints, optimizing repair scheduling for industrial robots.

Contribution

It presents a novel two-stage approach combining LSAP-based grouping and meta-trained PPO policies for efficient, scalable scheduling under capacity and budget constraints.

Findings

01

Outperforms baseline methods in maximizing robot uptime.

02

Effective for large teams with limited repair resources.

03

Scales well with increasing number of agents and technicians.

Abstract

We study capacity- and budget-constrained multi-agent MDPs (CB-MA-MDPs), a class that captures many maintenance and scheduling tasks in which each agent can irreversibly fail and a planner must decide (i) when to apply a restorative action and (ii) which subset of agents to treat in parallel. The global budget limits the total number of restorations, while the capacity constraint bounds the number of simultaneous actions, turning na\"ive dynamic programming into a combinatorial search that scales exponentially with the number of agents. We propose a two-stage solution that remains tractable for large systems. First, a Linear Sum Assignment Problem (LSAP)-based grouping partitions the agents into r disjoint sets (r = capacity) that maximise diversity in expected time-to-failure, allocating budget to each set proportionally. Second, a meta-trained PPO policy solves each sub-MDP,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsEntropy Regularization · Proximal Policy Optimization