Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale
Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, Shuyue Hu

TL;DR
This paper introduces AgentSkillOS, a framework for managing and orchestrating agent skills at ecosystem scale, demonstrating that structured skill organization and DAG-based orchestration significantly improve task performance.
Contribution
The paper presents the first principled framework for skill selection and orchestration at ecosystem scale, including a capability tree organization and DAG-based execution, validated by a comprehensive benchmark.
Findings
Tree-based skill retrieval approximates oracle selection.
DAG-based skill orchestration outperforms flat invocation.
Structured composition enhances skill utilization and task quality.
Abstract
The rapid proliferation of Claude agent skills has raised the central question of how to effectively leverage, manage, and scale the agent skill ecosystem. In this paper, we propose AgentSkillOS, the first principled framework for skill selection, orchestration, and ecosystem-level management. AgentSkillOS comprises two stages: (i) Manage Skills, which organizes skills into a capability tree via node-level recursive categorization for efficient discovery; and (ii) Solve Tasks, which retrieves, orchestrates, and executes multiple skills through DAG-based pipelines. To evaluate the agent's ability to invoke skills, we construct a benchmark of 30 artifact-rich tasks across five categories: data computation, document creation, motion video, visual design, and web interaction. We assess the quality of task outputs using LLM-based pairwise evaluation, and the results are aggregated via a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing
