TL;DR
This paper introduces a dialect normalization method for Finland Swedish dialects, demonstrating significant improvements in word error rate and highlighting the impact of training data size on model performance.
Contribution
The study presents a new dialect normalization approach for Swedish dialects in Finland and shows that training with one word at a time yields the best results, contrary to prior Finnish dialect research.
Findings
Best model reduced word error rate from 76.45 to 28.58
Training with one word at a time was most effective
Models are available as a Python package
Abstract
Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions. We tested 5 different models, and the best model improved the word error rate from 76.45 to 28.58. Contrary to results reported in earlier research on Finnish dialects, we found that training the model with one word at a time gave best results. We believe this is due to the size of the training data available for the model. Our models are accessible as a Python package. The study provides important information about the adaptability of these methods in different contexts, and gives important baselines for further study.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
