LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
Max Ploner, Jacek Wiland, Sebastian Pohl, Alan Akbik

TL;DR
LM-PUB-QUIZ is an open-source Python framework that facilitates zero-shot evaluation of relational knowledge in language models using the BEAR probing method, supporting detailed analysis and integration with popular training tools.
Contribution
It introduces LM-PUB-QUIZ, a comprehensive, easy-to-use framework and leaderboard for probing relational knowledge in LMs, enhancing prior methods with better analysis and integration capabilities.
Findings
Enables unbiased comparison of different LMs' relational knowledge
Supports integration with Hugging Face Transformers for streamlined evaluation
Provides detailed analysis of different knowledge types in language models
Abstract
Knowledge probing evaluates the extent to which a language model (LM) has acquired relational knowledge during its pre-training phase. It provides a cost-effective means of comparing LMs of different sizes and training setups and is useful for monitoring knowledge gained or lost during continual learning (CL). In prior work, we presented an improved knowledge probe called BEAR (Wiland et al., 2024), which enables the comparison of LMs trained with different pre-training objectives (causal and masked LMs) and addresses issues of skewed distributions in previous probes to deliver a more unbiased reading of LM knowledge. With this paper, we present LM-PUB- QUIZ, a Python framework and leaderboard built around the BEAR probing mechanism that enables researchers and practitioners to apply it in their work. It provides options for standalone evaluation and direct integration into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
