Finnish Dialect Identification: The Effect of Audio and Text
Mika H\"am\"al\"ainen, Khalid Alnajjar, Niko Partanen, Jack, Rueter

TL;DR
This paper introduces an automatic dialect identification method for Finnish using both audio and text data, achieving high accuracy by combining modalities, and provides open access to resources.
Contribution
It is the first to combine audio and text for Finnish dialect identification and demonstrates significant accuracy improvements over text-only methods.
Findings
Text-only accuracy is 57%.
Audio and text combined accuracy is 85%.
Resources are openly available.
Abstract
Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice. We present the first approach to automatically detect the dialect of a speaker based on a dialect transcript and transcript with audio recording in a dataset consisting of 23 different dialects. Our results show that the best accuracy is received by combining both of the modalities, as text only reaches to an overall accuracy of 57\%, where as text and audio reach to 85\%. Our code, models and data have been released openly on Github and Zenodo.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Authorship Attribution and Profiling
