GRETEL: A Goal-driven Retrieval and Execution-based Trial Framework for LLM Tool Selection Enhancing
Zongze Wu, Yani Guo, Churong Liang, Runnan Li

TL;DR
GRETEL introduces an execution-grounded framework for tool selection in LLM-based agents, significantly improving retrieval accuracy by validating functional viability through sandboxed execution cycles, surpassing traditional semantic similarity methods.
Contribution
The paper presents GRETEL, a novel execution-based validation framework that enhances tool retrieval accuracy by systematically testing functional viability, addressing the semantic-functional gap in agent systems.
Findings
Pass Rate increased from 0.690 to 0.826
Recall improved from 0.841 to 0.867
NDCG rose from 0.807 to 0.857
Abstract
Despite remarkable advances in Large Language Model capabilities, tool retrieval for agent-based systems remains fundamentally limited by reliance on semantic similarity, which fails to capture functional viability. Current methods often retrieve textually relevant but functionally inoperative tools due to parameter mismatches, authentication failures, and execution constraints--a phenomenon we term the semantic-functional gap. We introduce GRETEL, to address this gap through systematic empirical validation. GRETEL implements an agentic workflow that processes semantically retrieved candidates through sandboxed plan-execute-evaluate cycles, generating execution-grounded evidence to distinguish truly functional tools from merely descriptive matches. Our comprehensive evaluation on the ToolBench benchmark demonstrates substantial improvements across all metrics: Pass Rate (at 10)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Scientific Computing and Data Management · Semantic Web and Ontologies
