Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases
Yiheng Shu, Zhiwei Yu

TL;DR
This paper investigates the robustness challenges faced by language models when integrating with knowledge bases, highlighting their limited generalization and transferability due to data distribution issues, despite data augmentation efforts.
Contribution
It provides an experimental analysis of robustness issues in language models for KBQA, emphasizing the impact of data distribution mismatches and proposing directions for future research.
Findings
Language models perform poorly under distribution shifts.
Data augmentation does not fully mitigate robustness issues.
Robustness in complex environments remains limited.
Abstract
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsBalanced Selection
