TL;DR
This paper introduces a modular architecture for reasoning over text-based databases, enabling scalable and accurate answering of complex database queries from natural language, outperforming existing transformer models especially on larger datasets.
Contribution
The paper presents a novel modular architecture that scales to large text-based databases and effectively handles complex queries involving joins, filtering, and aggregation.
Findings
Improved accuracy from 85% to 90% on small databases.
Scales to thousands of facts, unlike baseline models.
Maintains accuracy on large databases where transformers fail.
Abstract
Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer models perform very well for small databases, they exhibit limitations in processing noisy data, numerical operations, and queries that aggregate facts. We propose a modular architecture to answer these database-style queries over multiple spans from text and aggregating these at scale. We evaluate the architecture using WikiNLDB, a novel dataset for exploring such queries. Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
