Data Distribution Bottlenecks in Grounding Language Models to Knowledge   Bases

Yiheng Shu; Zhiwei Yu

arXiv:2309.08345·cs.CL·February 12, 2024

Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases

Yiheng Shu, Zhiwei Yu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates the robustness challenges faced by language models when integrating with knowledge bases, highlighting their limited generalization and transferability due to data distribution issues, despite data augmentation efforts.

Contribution

It provides an experimental analysis of robustness issues in language models for KBQA, emphasizing the impact of data distribution mismatches and proposing directions for future research.

Findings

01

Language models perform poorly under distribution shifts.

02

Data augmentation does not fully mitigate robustness issues.

03

Robustness in complex environments remains limited.

Abstract

Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhshu/distribution-shifts-for-kbqa
noneOfficial

Datasets

yhshu/TIARA-GAIN
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsBalanced Selection