DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning
Ahmed G.A.H Ahmed, C. Okan Sakar

TL;DR
DW-Bench is a new benchmark for evaluating LLMs on graph-topology reasoning in data warehouses, focusing on foreign-key and data-lineage edges, with experiments highlighting the benefits of tool augmentation.
Contribution
Introduces DW-Bench, a comprehensive benchmark for LLM reasoning on data warehouse schemas, including a large set of verifiable questions and analysis of tool-augmented methods.
Findings
Tool-augmented methods outperform static approaches.
Performance plateaus on hard compositional subtypes.
Benchmark includes 1,046 questions across five schemas.
Abstract
This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtypes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
