How Pre-trained Language Models Capture Factual Knowledge? A   Causal-Inspired Analysis

Shaobo Li; Xiaoguang Li; Lifeng Shang; Zhenhua Dong; Chengjie Sun,; Bingquan Liu; Zhenzhou Ji; Xin Jiang; Qun Liu

arXiv:2203.16747·cs.CL·April 1, 2022

How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis

Shaobo Li, Xiaoguang Li, Lifeng Shang, Zhenhua Dong, Chengjie Sun,, Bingquan Liu, Zhenzhou Ji, Xin Jiang, Qun Liu

PDF

Open Access

TL;DR

This paper investigates how pre-trained language models generate factual information, revealing they rely more on positional and co-occurrence cues than on actual factual knowledge, indicating a need for improved factual understanding.

Contribution

The study introduces a causal-inspired analysis to quantitatively evaluate the reliance of PLMs on different word associations for factual knowledge retrieval.

Findings

01

PLMs depend more on positionally close and co-occurred words than knowledge-dependent words.

02

Dependence on knowledge-dependent words is more effective for factual accuracy.

03

PLMs capture factual knowledge ineffectively due to reliance on inadequate associations.

Abstract

Recently, there has been a trend to investigate the factual knowledge captured by Pre-trained Language Models (PLMs). Many works show the PLMs' ability to fill in the missing factual words in cloze-style prompts such as "Dante was born in [MASK]." However, it is still a mystery how PLMs generate the results correctly: relying on effective clues or shortcut patterns? We try to answer this question by a causal-inspired analysis that quantitatively measures and evaluates the word-level patterns that PLMs depend on to generate the missing words. We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred. Our analysis shows: (1) PLMs generate the missing factual words more by the positionally close and highly co-occurred words than the knowledge-dependent words; (2) the dependence on the knowledge-dependent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods