Scalable Join Inference for Large Context Graphs
Shivani Tripathi, Ravi Shetye, Shi Qiao, Alekh Jindal

TL;DR
This paper introduces a scalable hybrid method combining statistical techniques and Large Language Model reasoning to accurately infer join relationships in large context graphs, improving quality and scalability.
Contribution
It presents a novel hybrid approach that integrates statistical pruning with LLM reasoning for scalable join inference in large schemas, reducing false positives.
Findings
Achieves high precision (78-100%) on well-structured schemas.
Scales effectively to large schemas and evolving query workloads.
Highlights challenges in join discovery for poorly normalized schemas.
Abstract
Context graphs are essential for modern AI applications including question answering, pattern discovery, and data analysis. Building accurate context graphs from structured databases requires inferring join relationships between entities. Invalid joins introduce ambiguity and duplicate records, compromising graph quality. We present a scalable join inference approach combining statistical pruning with Large Language Model (LLM) reasoning. Unlike purely statistics-based methods, our hybrid approach mimics human semantic understanding while mitigating LLM hallucination through data-driven inference. We first identify primary key candidates and use LLMs for adjudication, then detect inclusion dependencies with the same two-stage process. This statistics-LLM combination scales to large schemas while maintaining accuracy and minimizing false positives. We further leverage the database query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Graph Neural Networks · Graph Theory and Algorithms
