An Agentic Approach to Metadata Reasoning
Jiani Zhang, Sercan O. Arik, Cosmin Arad, Fatma Ozcan, Alon Halevy

TL;DR
The paper presents the Metadata Reasoner, an agentic system that effectively identifies minimal, relevant data sources for complex tasks by reasoning over metadata, outperforming existing methods in real-world and synthetic benchmarks.
Contribution
Introduces the Metadata Reasoner, a novel agentic approach that autonomously reasons over metadata to improve data source selection for complex analytical tasks.
Findings
Achieves an average F1-score of 83.16% on KramaBench datasets, surpassing baselines by 32 percentage points.
Maintains 85.5% F1-score in noisy environments with redundant or low-quality data.
Demonstrates a 99% success rate in avoiding low-quality data in synthetic benchmarks.
Abstract
As LLM-driven autonomous agents evolve to perform complex, multi-step tasks that require integrating multiple datasets, the problem of discovering relevant data sources becomes a key bottleneck. Beyond the challenge posed by the sheer volume of available data sources, data-source selection is difficult because the semantics of data are extremely nuanced and require considering many aspects of the data. To address this, we introduce the Metadata Reasoner, an agentic approach to metadata reasoning, designed to identify a small set of data sources that are both sufficient and minimal for a given analytical task. The Metadata Reasoner leverages a table-search engine to retrieve candidate tables, and then autonomously consults various aspects of the available metadata to determine whether the candidates fit the requirements of the task. We demonstrate the effectiveness of the Metadata…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
