Presence or Absence: Are Unknown Word Usages in Dictionaries?
Xianghe Ma, Dominik Schlechtweg, Wei Zhao

TL;DR
This paper presents an unsupervised system that detects and interprets unknown word usages in dictionaries, bridging lexical semantic change detection and lexicography, and demonstrating strong performance across multiple languages.
Contribution
It introduces a graph-based clustering approach and uses large language models to map unknown usages to dictionary entries and generate definitions, advancing dictionary updating methods.
Findings
Outperforms baseline in sense mapping accuracy
Ranks first in Finnish and German, second in Russian
Shows potential for dictionary updates with novel senses
Abstract
There has been a surge of interest in computational modeling of semantic change. The foci of previous works are on detecting and interpreting word senses gained over time; however, it remains unclear whether the gained senses are covered by dictionaries. In this work, we aim to fill this research gap by comparing detected word senses with dictionary sense inventories in order to bridge between the communities of lexical semantic change detection and lexicography. We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages \cite{fedorova-etal-2024-axolotl}. Our system is fully unsupervised. It leverages a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries for Subtask 1, and generates dictionary-like definitions for those novel word usages through the state-of-the-art Large Language Models such as GPT-4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLexicography and Language Studies · linguistics and terminology studies · Historical Linguistics and Language Studies
MethodsSoftmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention
