ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech

Haowei Lou; Hye-young Paik; Wen Hu; Lina Yao

arXiv:2601.12289·cs.SD·January 21, 2026

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech

Haowei Lou, Hye-young Paik, Wen Hu, Lina Yao

PDF

Open Access

TL;DR

ParaMETA is a novel framework that learns disentangled, task-specific embeddings for various speaking styles from speech, enabling improved recognition and fine-grained style control in speech generation.

Contribution

It introduces a unified, flexible model that learns disentangled style representations for multiple paralinguistic tasks, reducing interference and enabling style control in TTS.

Findings

01

Outperforms baselines in classification accuracy

02

Generates more natural and expressive speech

03

Supports multi-style control in TTS applications

Abstract

Learning representative embeddings for different types of speaking styles, such as emotion, age, and gender, is critical for both recognition tasks (e.g., cognitive computing and human-computer interaction) and generative tasks (e.g., style-controllable speech generation). In this work, we introduce ParaMETA, a unified and flexible framework for learning and controlling speaking styles directly from speech. Unlike existing methods that rely on single-task models or cross-modal alignment, ParaMETA learns disentangled, task-specific embeddings by projecting speech into dedicated subspaces for each type of style. This design reduces inter-task interference, mitigates negative transfer, and allows a single model to handle multiple paralinguistic tasks such as emotion, gender, age, and language classification. Beyond recognition, ParaMETA enables fine-grained style control in Text-To-Speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Topic Modeling · Authorship Attribution and Profiling