A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans

Zhihan Cao; Hiroaki Yamada; Simone Teufel; Takenobu Tokunaga

arXiv:2412.01131·cs.CL·August 6, 2025

A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans

Zhihan Cao, Hiroaki Yamada, Simone Teufel, Takenobu Tokunaga

PDF

TL;DR

This paper introduces a comprehensive framework to evaluate and compare the semantic relation knowledge of pretrained language models and humans across five relations, revealing significant gaps especially in less studied relations.

Contribution

It develops a new evaluation framework with five metrics for six PLMs and humans, covering five semantic relations beyond hypernymy, and provides a comparative analysis of their knowledge.

Findings

01

Models perform worse than humans on all relations.

02

Causal models do not outperform masked models generally.

03

Antonymy is the relation where models perform best.

Abstract

Recently, much work has concerned itself with the enigma of what exactly pretrained language models~(PLMs) learn about different aspects of language, and how they learn it. One stream of this type of research investigates the knowledge that PLMs have about semantic relations. However, many aspects of semantic relations were left unexplored. Generally, only one relation has been considered, namely hypernymy. Furthermore, previous work did not measure humans' performance on the same task as that performed by the PLMs. This means that at this point in time, there is only an incomplete view of the extent of these models' semantic relation knowledge. To address this gap, we introduce a comprehensive evaluation framework covering five relations beyond hypernymy, namely hyponymy, holonymy, meronymy, antonymy, and synonymy. We use five metrics (two newly introduced here) for recently untreated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.