Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus

Kalle Lahtinen; Einari Vaaras; Liisa Mustanoja; Okko R\"as\"anen

arXiv:2505.17833·cs.CL·May 26, 2025

Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus

Kalle Lahtinen, Einari Vaaras, Liisa Mustanoja, Okko R\"as\"anen

PDF

TL;DR

This paper introduces the first spontaneous Finnish affective speech corpus created through an affect mining sampling approach, enhancing diversity in emotional annotation for speech research.

Contribution

It presents a novel affect mining-based sampling method for corpus creation, specifically applied to Finnish spontaneous speech, and compares it to random sampling.

Findings

01

Affect mining sampling increased annotation diversity.

02

The corpus contains 12,000 emotionally annotated utterances.

03

Sampling strategies can significantly impact affective speech corpus quality.

Abstract

Study of affect in speech requires suitable data, as emotional expression and perception vary across languages. Until now, no corpus has existed for natural expression of affect in spontaneous Finnish, existing data being acted or from a very specific communicative setting. This paper presents the first such corpus, created by annotating 12,000 utterances for emotional arousal and valence, sampled from three large-scale Finnish speech corpora. To ensure diverse affective expression, sample selection was conducted with an affect mining approach combining acoustic, cross-linguistic speech emotion, and text sentiment features. We compare this method to random sampling in terms of annotation diversity, and conduct post-hoc analyses to identify sampling choices that would have maximized the diversity. As an outcome, the work introduces a spontaneous Finnish affective speech corpus and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.