MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation
Rifny Rachman, Josh Tingey, Richard Allmendinger, Wei Pan, Pradyumn Shukla, Bahrul Ilmi Nasution

TL;DR
MIRACL introduces a hierarchical meta-reinforcement learning framework that enables efficient, few-shot adaptation for multi-objective, multi-echelon supply chain optimisation, outperforming traditional methods in dynamic environments.
Contribution
It is the first to integrate Meta-MORL with structured subproblem decomposition and Pareto-based meta-learning for combinatorial optimisation tasks.
Findings
MIRACL achieves up to 10% higher hypervolume than baselines.
MIRACL attains 5% better expected utility in experiments.
The framework demonstrates robust adaptation in diverse supply chain scenarios.
Abstract
Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining and substantial computational cost. We introduce MIRACL (Meta multI-objective Reinforcement leArning with Composite Learning), a hierarchical Meta-MORL framework that allows for a few-shot generalisation across diverse tasks. MIRACL decomposes each task into structured subproblems for efficient policy adaptation and meta-learns a global policy across tasks using a Pareto-based adaptation strategy to encourage diversity in meta-training and fine-tuning. To our knowledge, this is the first integration of Meta-MORL with such mechanisms in combinatorial optimisation. Although validated in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics · Vehicle Routing Optimization Methods
