Exploring Multi-Table Retrieval Through Iterative Search
Allaa Boutaleb, Bernd Amann, Rafael Angarita, Hubert Naacke

TL;DR
This paper introduces an iterative search framework for multi-table retrieval in open-domain question answering, balancing relevance, coverage, and joinability, achieving high performance with significantly improved efficiency.
Contribution
It presents a novel iterative heuristic approach for multi-table retrieval that is scalable, interpretable, and effective, outperforming exact optimization methods in speed while maintaining competitive accuracy.
Findings
Achieves competitive retrieval performance with MIP-based methods.
Runs 4-400x faster depending on the benchmark.
Demonstrates effectiveness across 5 NL2SQL benchmarks.
Abstract
Open-domain question answering over datalakes requires retrieving and composing information from multiple tables, a challenging subtask that demands semantic relevance and structural coherence (e.g., joinability). While exact optimization methods like Mixed-Integer Programming (MIP) can ensure coherence, their computational complexity is often prohibitive. Conversely, simpler greedy heuristics that optimize for query coverage alone often fail to find these coherent, joinable sets. This paper frames multi-table retrieval as an iterative search process, arguing this approach offers advantages in scalability, interpretability, and flexibility. We propose a general framework and a concrete instantiation: a fast, effective Greedy Join-Aware Retrieval algorithm that holistically balances relevance, coverage, and joinability. Experiments across 5 NL2SQL benchmarks demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · SAS software applications and methods · Data Quality and Management
