Normalization of Different Swedish Dialects Spoken in Finland

Mika H\"am\"al\"ainen; Niko Partanen; Khalid Alnajjar

arXiv:2012.05318·cs.CL·December 11, 2020

Normalization of Different Swedish Dialects Spoken in Finland

Mika H\"am\"al\"ainen, Niko Partanen, Khalid Alnajjar

PDF

1 Repo

TL;DR

This paper introduces a dialect normalization method for Finland Swedish dialects, demonstrating significant improvements in word error rate and highlighting the impact of training data size on model performance.

Contribution

The study presents a new dialect normalization approach for Swedish dialects in Finland and shows that training with one word at a time yields the best results, contrary to prior Finnish dialect research.

Findings

01

Best model reduced word error rate from 76.45 to 28.58

02

Training with one word at a time was most effective

03

Models are available as a Python package

Abstract

Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions. We tested 5 different models, and the best model improved the word error rate from 76.45 to 28.58. Contrary to results reported in earlier research on Finnish dialects, we found that training the model with one word at a time gave best results. We believe this is due to the size of the training data available for the model. Our models are accessible as a Python package. The study provides important information about the adaptability of these methods in different contexts, and gives important baselines for further study.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mikahama/murre
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.