Monolingual alignment of word senses and definitions in lexicographical resources
Sina Ahmadi

TL;DR
This paper develops methods for aligning monolingual lexicographical data, creating a benchmark for evaluation, and extending the work to translation inference, aiding resource creation for under-resourced languages.
Contribution
Introduces a new benchmark for monolingual word sense alignment and proposes methods for translation inference using graph analysis, with practical tool implementation.
Findings
Benchmark contains 17 datasets across 15 languages.
Alignment techniques show varying performance on the benchmark.
Unsupervised translation inference improves lexicon coverage for low-resource languages.
Abstract
The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries. In order to tackle some of the challenges in this field, two main tasks of word sense alignment and translation inference are addressed. The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries. This is a challenging task, especially due to differences in sense granularity, coverage and description in two resources. After describing the characteristics of various lexical semantic resources, we introduce a benchmark containing 17 datasets of 15 languages where monolingual word senses and definitions are manually annotated across different resources by experts. In the creation of the benchmark, lexicographers' knowledge is incorporated through the annotations where a semantic relation, namely exact, narrower,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · linguistics and terminology studies
