A Hybrid Search for Complex Table Question Answering in Securities Report
Daiki Shirafuji, Koji Tanaka, Tatsuhiko Saito

TL;DR
This paper introduces a hybrid retrieval method combining language models and TF-IDF for complex table question answering, improving accuracy over existing LLMs by effectively extracting relevant table cells.
Contribution
The paper presents a novel cell extraction approach for TQA that estimates headers via hybrid retrieval and trains LLMs with contrastive learning, addressing complex table structures.
Findings
Achieved 74.6% accuracy on TQA dataset, outperforming GPT-4o mini.
Effective header estimation improves cell selection in complex tables.
Contrastive training enhances LLM performance in TQA tasks.
Abstract
Recently, Large Language Models (LLMs) are gaining increased attention in the domain of Table Question Answering (TQA), particularly for extracting information from tables in documents. However, directly entering entire tables as long text into LLMs often leads to incorrect answers because most LLMs cannot inherently capture complex table structures. In this paper, we propose a cell extraction method for TQA without manual identification, even for complex table headers. Our approach estimates table headers by computing similarities between a given question and individual cells via a hybrid retrieval mechanism that integrates a language model and TF-IDF. We then select as the answer the cells at the intersection of the most relevant row and column. Furthermore, the language model is trained using contrastive learning on a small dataset of question-header pairs to enhance performance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Handwritten Text Recognition Techniques · Advanced Text Analysis Techniques
