AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

Yuchen Tian; Kaixin Li; Hao Chen; Ziyang Luo; Hongzhan Lin; Sebastian Schelter; Lun Du; Jing Ma

arXiv:2508.09631·cs.DB·August 14, 2025

AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

Yuchen Tian, Kaixin Li, Hao Chen, Ziyang Luo, Hongzhan Lin, Sebastian Schelter, Lun Du, Jing Ma

PDF

TL;DR

This paper introduces AmbiGraph-Eval, a benchmark to evaluate how well large language models handle ambiguous graph queries, revealing significant challenges and gaps in current models' ambiguity resolution capabilities.

Contribution

The paper proposes a taxonomy of graph-query ambiguities and presents AmbiGraph-Eval, a new benchmark for systematically assessing LLMs on ambiguous graph query understanding.

Findings

01

Top LLMs struggle with ambiguous graph queries.

02

Ambiguity handling remains a critical challenge for LLMs.

03

Benchmark reveals significant gaps in current models' capabilities.

Abstract

Large Language Models (LLMs) have recently demonstrated strong capabilities in translating natural language into database queries, especially when dealing with complex graph-structured data. However, real-world queries often contain inherent ambiguities, and the interconnected nature of graph structures can amplify these challenges, leading to unintended or incorrect query results. To systematically evaluate LLMs on this front, we propose a taxonomy of graph-query ambiguities, comprising three primary types: Attribute Ambiguity, Relationship Ambiguity, and Attribute-Relationship Ambiguity, each subdivided into Same-Entity and Cross-Entity scenarios. We introduce AmbiGraph-Eval, a novel benchmark of real-world ambiguous queries paired with expert-verified graph query answers. Evaluating 9 representative LLMs shows that even top models struggle with ambiguous graph queries. Our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.