Co-occurrence is not Factual Association in Language Models
Xiao Zhang, Miao Li, Ji Wu

TL;DR
This paper reveals that language models primarily learn co-occurrence patterns rather than true factual associations, and proposes strategies to enhance factual knowledge learning and generalization in these models.
Contribution
It identifies the layer-specific encoding of knowledge in language models and introduces methods to promote learning of factual associations over co-occurrence biases.
Findings
Training on implicit factual associations improves generalization.
Forgetting co-occurrence statistics enhances factual learning.
Strategies improve reasoning performance on synthetic and real data.
Abstract
Pretrained language models can encode a large amount of knowledge and utilize it for various reasoning tasks, yet they can still struggle to learn novel factual knowledge effectively from finetuning on limited textual demonstrations. In this work, we show that the reason for this deficiency is that language models are biased to learn word co-occurrence statistics instead of true factual associations. We identify the differences between two forms of knowledge representation in language models: knowledge in the form of co-occurrence statistics is encoded in the middle layers of the transformer model and does not generalize well to reasoning scenarios beyond simple question answering, while true factual associations are encoded in the lower layers and can be freely utilized in various reasoning tasks. Based on these observations, we propose two strategies to improve the learning of factual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
