Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models
Bogdan Kosti\'c, Julian Risch, Timo M\"oller

TL;DR
This paper introduces a multi-modal retrieval approach using tri-encoder models to effectively retrieve relevant texts and tables for open-domain question answering, outperforming previous methods on multiple datasets.
Contribution
It presents a novel tri-encoder architecture for joint encoding of questions, texts, and tables, along with a new multi-modal dataset for training and evaluation.
Findings
Dense transformer embeddings outperform sparse embeddings on most datasets.
Tri-encoders improve retrieval performance over bi-encoders.
The multi-modal dataset is publicly released for further research.
Abstract
Open-domain extractive question answering works well on textual data by first retrieving candidate texts and then extracting the answer from those candidates. However, some questions cannot be answered by text alone but require information stored in tables. In this paper, we present an approach for retrieving both texts and tables relevant to a question by jointly encoding texts, tables and questions into a single vector space. To this end, we create a new multi-modal dataset based on text and table datasets from related work and compare the retrieval performance of different encoding schemata. We find that dense vector embeddings of transformer models outperform sparse embeddings on four out of six evaluation datasets. Comparing different dense embedding models, tri-encoders with one encoder for each question, text and table, increase retrieval performance compared to bi-encoders with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
