KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification   and Reasoning Abilities of Language Models

Daniel Gao; Yantao Jia; Lei Li; Chengzhen Fu; Zhicheng Dou; Hao Jiang,; Xinyu Zhang; Lei Chen; Zhao Cao

arXiv:2202.13529·cs.CL·March 1, 2022·1 cites

KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models

Daniel Gao, Yantao Jia, Lei Li, Chengzhen Fu, Zhicheng Dou, Hao Jiang,, Xinyu Zhang, Lei Chen, Zhao Cao

PDF

Open Access

TL;DR

This paper introduces KMIR, a comprehensive benchmark to evaluate the knowledge memorization, identification, and reasoning abilities of pre-trained language models across various knowledge types, revealing their strengths and limitations.

Contribution

The paper presents KMIR, a new benchmark with 184,348 questions to assess key knowledge-related capabilities of PLMs, addressing gaps in evaluating their reliability as knowledge sources.

Findings

01

PLMs' memorization depends more on parameter count than training schemes

02

Current PLMs struggle with robust fact recall

03

Model compression retains knowledge but impairs reasoning and identification

Abstract

Previous works show the great potential of pre-trained language models (PLMs) for storing a large amount of factual knowledge. However, to figure out whether PLMs can be reliable knowledge sources and used as alternative knowledge bases (KBs), we need to further explore some critical features of PLMs. Firstly, knowledge memorization and identification abilities: traditional KBs can store various types of entities and relationships; do PLMs have a high knowledge capacity to store different types of knowledge? Secondly, reasoning ability: a qualified knowledge source should not only provide a collection of facts, but support a symbolic reasoner. Can PLMs derive new knowledge based on the correlations between facts? To evaluate these features of PLMs, we propose a benchmark, named Knowledge Memorization, Identification, and Reasoning test (KMIR). KMIR covers 3 types of knowledge, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification