Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

Girish; Mohd Mujtaba Akhtar; Muskaan Singh

arXiv:2604.17647·eess.AS·April 24, 2026

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

Girish, Mohd Mujtaba Akhtar, Muskaan Singh

PDF

TL;DR

This paper introduces NOVA ARC, a novel geometry-aware framework that leverages non-verbal vocalizations to improve multilingual speech emotion recognition, especially in low-resource settings, by transferring supervision from non-verbal to verbal speech.

Contribution

It proposes a new non-verbal-to-verbal transfer paradigm and a geometry-aware model for multilingual speech emotion recognition, outperforming existing Euclidean and SSL baselines.

Findings

01

NOVA ARC achieves the strongest performance in non-verbal-to-verbal adaptation.

02

It outperforms Euclidean counterparts and strong SSL baselines.

03

First to introduce non-verbal-to-verbal transfer for SER.

Abstract

In this work, we introduce a paralinguistic supervision paradigm for low-resource multilingual speech emotion recognition (LRM-SER) that leverages non-verbal vocalizations to exploit prosody-centric emotion cues. Unlike conventional SER systems that rely heavily on labeled verbal speech and suffer from poor cross-lingual transfer, our approach reformulates LRM-SER as non-verbal-to-verbal transfer, where supervision from a labeled non-verbal source domain is adapted to unlabeled verbal speech across multiple target languages. To this end, we propose NOVA ARC, a geometry-aware framework that models affective structure in the Poincar\'e ball, discretizes paralinguistic patterns via a hyperbolic vector-quantized prosody codebook, and captures emotion intensity through a hyperbolic emotion lens. For unsupervised adaptation, NOVA-ARC performs optimal transport based prototype alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.