HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
Ivan Vuli\'c, Daniela Gerz, Douwe Kiela, Felix Hill, and Anna Korhonen

TL;DR
HyperLex presents a large dataset and evaluation framework that captures the gradual nature of lexical entailment, revealing gaps between human judgments and current models, and guiding future improvements.
Contribution
Introduces HyperLex, a large-scale dataset for graded lexical entailment, and compares human judgments with model predictions to highlight discrepancies and future directions.
Findings
Human judgments show lexical entailment is gradual, not binary.
Current models significantly lag behind human performance.
Substantial differences exist among different automatic systems.
Abstract
We introduce HyperLex - a dataset and evaluation resource that quantifies the extent of of the semantic category membership, that is, type-of relation also known as hyponymy-hypernymy or lexical entailment (LE) relation between 2,616 concept pairs. Cognitive psychology research has established that typicality and category/class membership are computed in human semantic memory as a gradual rather than binary relation. Nevertheless, most NLP research, and existing large-scale invetories of concept category membership (WordNet, DBPedia, etc.) treat category membership and LE as binary. To address this, we asked hundreds of native English speakers to indicate typicality and strength of category membership between a diverse range of concept pairs on a crowdsourcing platform. Our results confirm that category membership and LE are indeed more gradual than binary. We then compare these human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
