Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Yuwei Zhang; Wenhao Yu; Shangbin Feng; Yifan Zhu; Letian Peng; Jayanth Srinivasa; Gaowen Liu; Jingbo Shang

arXiv:2505.12306·cs.CL·May 20, 2025

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Yuwei Zhang, Wenhao Yu, Shangbin Feng, Yifan Zhu, Letian Peng, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces WikiDYK, a large-scale, evolving benchmark for real-world knowledge injection into language models, revealing that bidirectional models outperform causal models in memorization and proposing a collaborative framework to enhance knowledge reliability.

Contribution

The paper presents WikiDYK, a novel benchmark for knowledge injection, and demonstrates that bidirectional models have superior memorization abilities, along with a modular ensemble framework to improve knowledge reliability.

Findings

01

Bidirectional LMs outperform causal LMs in knowledge memorization.

02

Ensemble framework improves knowledge reliability by up to 29.1%.

03

WikiDYK is a scalable, evolving benchmark for real-world knowledge testing.

Abstract

Despite significant advances in large language models (LLMs), their knowledge memorization capabilities remain underexplored, due to the lack of standardized and high-quality test ground. In this paper, we introduce a novel, real-world and large-scale knowledge injection benchmark that evolves continuously over time without requiring human intervention. Specifically, we propose WikiDYK, which leverages recently-added and human-written facts from Wikipedia's "Did You Know..." entries. These entries are carefully selected by expert Wikipedia editors based on criteria such as verifiability and clarity. Each entry is converted into multiple question-answer pairs spanning diverse task formats from easy cloze prompts to complex multi-hop questions. WikiDYK contains 12,290 facts and 77,180 questions, which is also seamlessly extensible with future updates from Wikipedia editors. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang-yu-wei/WikiDYK
pytorchOfficial

Datasets

YWZBrandon/wikidyk
dataset· 35 dl
35 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks