Adapting Definition Modeling for New Languages: A Case Study on Belarusian
Daniela Kazakouskaya, Timothee Mickus, Janine Siewert

TL;DR
This paper explores adapting definition modeling to Belarusian, demonstrating that existing models can be effectively transferred with minimal data, though current metrics have limitations in capturing all aspects of quality.
Contribution
The study introduces a new Belarusian dataset for definition modeling and shows how existing models can be adapted with limited data for under-resourced languages.
Findings
Effective adaptation with minimal data
Gaps identified in automatic metric evaluations
New Belarusian definition dataset created
Abstract
Definition modeling, the task of generating new definitions for words in context, holds great prospect as a means to assist the work of lexicographers in documenting a broader variety of lects and languages, yet much remains to be done in order to assess how we can leverage pre-existing models for as-of-yet unsupported languages. In this work, we focus on adapting existing models to Belarusian, for which we propose a novel dataset of 43,150 definitions. Our experiments demonstrate that adapting a definition modeling systems requires minimal amounts of data, but that there currently are gaps in what automatic metrics do capture.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topicslinguistics and terminology studies
