AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation

Yunxiao Shi; Wujiang Xu; Tingwei Chen; Haoning Shang; Ling Yang; Yunfeng Wan; Zhuo Cao; Xing Zi; Dimitris N. Metaxas; Min Xu

arXiv:2603.03761·cs.AI·March 5, 2026

AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation

Yunxiao Shi, Wujiang Xu, Tingwei Chen, Haoning Shang, Ling Yang, Yunfeng Wan, Zhuo Cao, Xing Zi, Dimitris N. Metaxas, Min Xu

PDF

Open Access

TL;DR

AgentSelect introduces a comprehensive benchmark for recommending LLM agents based on narrative queries, addressing the lack of end-to-end evaluation and enabling improved agent selection and transferability.

Contribution

It creates the first unified dataset and evaluation framework for query-to-agent recommendation, capturing diverse agent types and enabling capability-sensitive learning.

Findings

01

Content-aware matching outperforms popularity-based methods.

02

Synthesized interactions are learnable and improve coverage.

03

Models trained on AgentSelect transfer effectively to external marketplaces.

Abstract

LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM leaderboards and tool/agent benchmarks evaluate components in isolation and remain fragmented across tasks, metrics, and candidate pools, leaving a critical research gap: there is little query-conditioned supervision for learning to recommend end-to-end agent configurations that couple a backbone model with a toolkit. We address this gap with AgentSelect, a benchmark that reframes agent selection as narrative query-to-agent recommendation over capability profiles and systematically converts heterogeneous evaluation artifacts into unified, positive-only interaction data. AgentSelectcomprises 111,179 queries, 107,721 deployable agents, and 251,103 interaction records aggregated from 40+ sources,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Big Data and Digital Economy