Disentangling Textual and Acoustic Features of Neural Speech   Representations

Hosein Mohebbi; Grzegorz Chrupa{\l}a; Willem Zuidema; Afra Alishahi,; Ivan Titov

arXiv:2410.03037·cs.CL·October 7, 2024

Disentangling Textual and Acoustic Features of Neural Speech Representations

Hosein Mohebbi, Grzegorz Chrupa{\l}a, Willem Zuidema, Afra Alishahi,, Ivan Titov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a disentanglement framework based on the Information Bottleneck principle to separate textual and acoustic features in neural speech models, aiding privacy and interpretability in speech processing tasks.

Contribution

It proposes a novel disentanglement method that isolates content and acoustic features in neural speech representations, enabling better analysis and privacy control.

Findings

01

Effective separation of textual and acoustic features demonstrated

02

Quantified feature contributions across model layers

03

Identified salient speech frames for downstream tasks

Abstract

Neural speech models build deeply entangled internal representations, which capture a variety of features (e.g., fundamental frequency, loudness, syntactic category, or semantic content of a word) in a distributed encoding. This complexity makes it difficult to track the extent to which such representations rely on textual and acoustic information, or to suppress the encoding of acoustic features that may pose privacy risks (e.g., gender or speaker identity) in critical, real-world applications. In this paper, we build upon the Information Bottleneck principle to propose a disentanglement framework that separates complex speech representations into two distinct components: one encoding content (i.e., what can be transcribed as text) and the other encoding acoustic features relevant to a given downstream task. We apply and evaluate our framework to emotion recognition and speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hmohebbi/disentangling_representations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis