Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Jinyang Wu; Guocheng Zhai; Ruihan Jin; Yuhao Shen; Zhengxi Lu; Fan Zhang; Haoran Luo; Zheng Lian; Zhengqi Wen; Jianhua Tao

arXiv:2605.22177·cs.LG·May 22, 2026

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao

PDF

1 Repo 1 Models

TL;DR

Maestro is a reinforcement learning framework that dynamically orchestrates ensembles of models and skills for multimodal tasks, outperforming large monolithic models with low latency.

Contribution

It introduces a hierarchical, RL-driven orchestration method that effectively combines multiple models and skills without retraining, enhancing multimodal task performance.

Findings

01

Maestro surpasses GPT-5 and Gemini-2.5-Pro in accuracy on multimodal benchmarks.

02

The learned policy generalizes to unseen models and skills without retraining.

03

Maestro maintains high efficiency with low latency.

Abstract

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advantages across diverse domains, yet current frameworks fail to exploit the complementary strengths of models and skills, thereby limiting their performance on downstream tasks. In this paper, we present Maestro (Multimodal Agent for Expert-Skill Targeted Reinforced Orchestration), a Reinforcement Learning (RL)-driven orchestration framework that reframes heterogeneous multimodal tasks as a sequential decision-making process over a hierarchical model-skill registry. Rather than consolidating all knowledge into a single model, Maestro trains a lightweight policy to dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinyangwu/Maestro
github

Models

🤗
Jinyang23/Maestro-4B
model· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.