Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning

Bipasha Kashyap; Bj\"orn W. Schuller; Pubudu N. Pathirana

arXiv:2602.20592·cs.SD·February 25, 2026

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning

Bipasha Kashyap, Bj\"orn W. Schuller, Pubudu N. Pathirana

PDF

Open Access

TL;DR

This paper introduces an information-theoretic framework to quantify statistical dependence among speech features, revealing weak coupling across dimensions and source/filter dominance for different speech attributes.

Contribution

The authors develop a novel MI-based method to measure cross-dimension dependence in speech features, providing a principled approach for disentangled representation learning.

Findings

01

Cross-dimension MI is low (<0.15 nats), indicating weak dependence.

02

Source--Filter MI is higher (0.47 nats), showing stronger coupling.

03

Source dominates emotional dimensions, while filter dominates linguistic and pathological ones.

Abstract

Speech signals encode emotional, linguistic, and pathological information within a shared acoustic channel; however, disentanglement is typically assessed indirectly through downstream task performance. We introduce an information-theoretic framework to quantify cross-dimension statistical dependence in handcrafted acoustic features by integrating bounded neural mutual information (MI) estimation with non-parametric validation. Across six corpora, cross-dimension MI remains low, with tight estimation bounds ( $< 0.15$ nats), indicating weak statistical coupling in the data considered, whereas Source--Filter MI is substantially higher (0.47 nats). Attribution analysis, defined as the proportion of total MI attributable to source versus filter components, reveals source dominance for emotional dimensions (80\%) and filter dominance for linguistic and pathological dimensions (60\% and 58\%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Neural dynamics and brain function