TL;DR
This paper introduces a new SVM-based language identification system for Romansh dialects, achieving high accuracy and supporting applications like spell checking and translation.
Contribution
The paper presents the first effective LID system for Romansh idioms, including Rumantsch Grischun, with a new benchmark dataset and publicly available classifier.
Findings
Achieved 97% in-domain accuracy on benchmark data.
Successfully distinguished between Romansh idioms and Rumantsch Grischun.
Enabled applications such as idiom-aware spell checking and machine translation.
Abstract
The Romansh language has several regional varieties, called idioms, which sometimes have limited mutual intelligibility. Despite this linguistic diversity, there has been a lack of documented efforts to build a language identification (LID) system that can distinguish between these idioms. Since Romansh LID should also be able to recognize Rumantsch Grischun, a supra-regional variety that combines elements of several idioms, this makes for a novel and interesting classification problem. In this paper, we present a LID system for Romansh idioms based on an SVM approach. We evaluate our model on a newly curated benchmark across two domains and find that it reaches an average in-domain accuracy of 97%, enabling applications such as idiom-aware spell checking or machine translation. Our classifier is publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
