Multi-label Scandinavian Language Identification (SLIDE)
Mariia Fedorova, Jonas Sebulon Frydenberg, Victoria Handford, Victoria, Ovedie Chruickshank Lang{\o}, Solveig Helene Willoch, Marthe L{\o}ken, Midtgaard, Yves Scherrer, Petter M{\ae}hlum, David Samuel

TL;DR
This paper introduces SLIDE, a new dataset and models for multi-label sentence-level language identification among Scandinavian languages, emphasizing the importance of recognizing multiple languages in a single sentence for improved accuracy.
Contribution
The paper presents SLIDE, a manually curated multi-label dataset, and a suite of LID models with different speed-accuracy tradeoffs, along with a novel training approach for multi-label language identification.
Findings
Multi-label identification improves accuracy over single-label methods.
SLIDE dataset enables better evaluation of Scandinavian LID models.
Proposed models demonstrate effective multi-language detection in Scandinavian texts.
Abstract
Identifying closely related languages at sentence level is difficult, in particular because it is often impossible to assign a sentence to a single language. In this paper, we focus on multi-label sentence-level Scandinavian language identification (LID) for Danish, Norwegian Bokm\r{a}l, Norwegian Nynorsk, and Swedish. We present the Scandinavian Language Identification and Evaluation, SLIDE, a manually curated multi-label evaluation dataset and a suite of LID models with varying speed-accuracy tradeoffs. We demonstrate that the ability to identify multiple languages simultaneously is necessary for any accurate LID method, and present a novel approach to training such multi-label LID models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling
MethodsFocus
