Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
Bipasha Kashyap, Bj\"orn W. Schuller, Pubudu N. Pathirana

TL;DR
This paper introduces an information-theoretic framework to quantify statistical dependence among speech features, revealing weak coupling across dimensions and source/filter dominance for different speech attributes.
Contribution
The authors develop a novel MI-based method to measure cross-dimension dependence in speech features, providing a principled approach for disentangled representation learning.
Findings
Cross-dimension MI is low (<0.15 nats), indicating weak dependence.
Source--Filter MI is higher (0.47 nats), showing stronger coupling.
Source dominates emotional dimensions, while filter dominates linguistic and pathological ones.
Abstract
Speech signals encode emotional, linguistic, and pathological information within a shared acoustic channel; however, disentanglement is typically assessed indirectly through downstream task performance. We introduce an information-theoretic framework to quantify cross-dimension statistical dependence in handcrafted acoustic features by integrating bounded neural mutual information (MI) estimation with non-parametric validation. Across six corpora, cross-dimension MI remains low, with tight estimation bounds ( nats), indicating weak statistical coupling in the data considered, whereas Source--Filter MI is substantially higher (0.47 nats). Attribution analysis, defined as the proportion of total MI attributable to source versus filter components, reveals source dominance for emotional dimensions (80\%) and filter dominance for linguistic and pathological dimensions (60\% and 58\%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Neural dynamics and brain function
