Improving Table Retrieval with Question Generation from Partial Tables
Hsing-Ping Liang, Che-Wei Chang, Yao-Chung Fan

TL;DR
This paper introduces QGpT, a method that uses large language models to generate synthetic questions from partial tables, improving table retrieval by better aligning table representations with user queries.
Contribution
The paper proposes a novel approach to enhance table retrieval by generating synthetic questions from partial tables to improve their embedding representations.
Findings
Significant improvement in retrieval performance across multiple benchmarks.
Effective enhancement for both dense and late-interaction retrievers.
No need to embed entire tables, reducing computational complexity.
Abstract
Recent advances in open-domain question answering over tables have widely adopted large language models (LLMs) under the Retriever-Reader architecture. Prior works have effectively leveraged LLMs to tackle the complex reasoning demands of the Reader component, such as text-to-text, text-to-SQL, and multi hop reasoning. In contrast, the Retriever component has primarily focused on optimizing the query representation-training retrievers to retrieve relevant tables based on questions, or to select keywords from questions for matching table segments. However, little attention has been given to enhancing how tables themselves are represented in embedding space to better align with questions. To address this, we propose QGpT (Question Generation from Partial Tables), a simple yet effective method that uses an LLM to generate synthetic questions based on small portions of a table. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Quality and Management · Handwritten Text Recognition Techniques · Web Data Mining and Analysis
