CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning
Masato Kikuchi, Masatsugu Ono, Toshioki Soga, Tetsu Tanabe, Tadachika Ozono

TL;DR
This paper introduces a CEFR-annotated WordNet, created using large language models, to better align semantic sense distinctions with language proficiency levels, aiding language learning and NLP tasks.
Contribution
It presents a novel automated method to annotate WordNet with CEFR levels using LLMs, and develops classifiers that perform well on proficiency-level prediction.
Findings
Classifiers achieve a Macro-F1 score of 0.81.
Models fine-tuned on the corpus perform comparably to gold-standard annotations.
Annotated resources are publicly available for NLP and language education.
Abstract
Although WordNet is a valuable resource because of its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this issue, we developed a version of WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our approach, we constructed a large-scale corpus containing both sense and CEFR-level information from the annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on this corpus perform comparably to those fine-tuned on gold-standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Second Language Acquisition and Learning
