FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents

Satu Hopponen; Tomi Kinnunen; Alexandre Nikolaev; Rosa Gonz\'alez Hautam\"aki; Lauri Tavi; Einar Meister

arXiv:2506.08981·cs.CL·June 11, 2025

FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents

Satu Hopponen, Tomi Kinnunen, Alexandre Nikolaev, Rosa Gonz\'alez Hautam\"aki, Lauri Tavi, Einar Meister

PDF

Open Access

TL;DR

This paper introduces FROST-EMA, a novel bilingual speech dataset with electromagnetic articulography data, enabling research on language variability, accents, and speaker verification across Finnish and Russian speakers in native, second, and imitated accents.

Contribution

The paper presents a new bilingual electromagnetic articulography dataset with speech in L1, L2, and imitated L2, along with initial case studies demonstrating its research potential.

Findings

01

L2 and imitated L2 affect speaker verification performance

02

Articulatory patterns differ across L1, L2, and fake accents

03

Dataset supports phonetic and technological research

Abstract

We introduce a new FROST-EMA (Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography) corpus. It consists of 18 bilingual speakers, who produced speech in their native language (L1), second language (L2), and imitated L2 (fake foreign accent). The new corpus enables research into language variability from phonetic and technological points of view. Accordingly, we include two preliminary case studies to demonstrate both perspectives. The first case study explores the impact of L2 and imitated L2 on the performance of an automatic speaker verification system, while the second illustrates the articulatory patterns of one speaker in L1, L2, and a fake accent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Emotion and Mood Recognition · Speech Recognition and Synthesis