THAI Speech Emotion Recognition (THAI-SER) corpus

Jilamika Wongpithayadisai; Chompakorn Chaksangchaichot; Soravitt Sangnark; Patawee Prakrankamanant; Krit Gangwanpongpun; Siwa Boonpunmongkol; Premmarin Milindasuta; Dangkamon Na-Pombejra; Sarana Nutanong; Ekapol Chuangsuwanich

arXiv:2507.09618·cs.SD·July 15, 2025

THAI Speech Emotion Recognition (THAI-SER) corpus

Jilamika Wongpithayadisai, Chompakorn Chaksangchaichot, Soravitt Sangnark, Patawee Prakrankamanant, Krit Gangwanpongpun, Siwa Boonpunmongkol, Premmarin Milindasuta, Dangkamon Na-Pombejra, Sarana Nutanong, Ekapol Chuangsuwanich

PDF

Open Access 2 Datasets

TL;DR

The paper introduces THAI-SER, a comprehensive Thai speech emotion recognition corpus with diverse recordings, annotations, and quality controls, facilitating research in Thai emotion recognition.

Contribution

It presents the first large-scale Thai speech emotion corpus with rigorous annotation and quality control, enabling improved emotion recognition models for Thai language.

Findings

01

Achieved an inter-annotator reliability score of 0.692

02

Human recognition accuracy reached 0.772 after filtering

03

Model trained on the corpus performs well in in-corpus and cross-corpus evaluations

Abstract

We present the first sizeable corpus of Thai speech emotion recognition, THAI-SER, containing 41 hours and 36 minutes (27,854 utterances) from 100 recordings made in different recording environments: Zoom and two studio setups. The recordings contain both scripted and improvised sessions, acted by 200 professional actors (112 females and 88 males, aged 18 to 55) and were directed by professional directors. There are five primary emotions: neutral, angry, happy, sad, and frustrated, assigned to the actors when recording utterances. The utterances are annotated with an emotional category using crowdsourcing. To control the annotation process's quality, we also design an extensive filtering and quality control scheme to ensure that the majority agreement score remains above 0.71. We evaluate our annotated corpus using two metrics: inter-annotator reliability and human recognition accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis