Topological Data Analysis for Speech Processing
Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil, Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko,, Evgeny Burnaev

TL;DR
This paper demonstrates that topological data analysis (TDA) applied to speech data and models can improve classification accuracy and reveal functional roles of model components, offering a new structural perspective in speech processing.
Contribution
It introduces topological and algebraic features derived from Transformer attention maps for speech analysis, outperforming traditional methods and providing insights into model internals.
Findings
Achieved about 9% accuracy improvement on four datasets.
Reaches 80.155% accuracy on CREMA-D, setting a new state of the art.
Topological features can identify functional roles of Transformer heads.
Abstract
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about accuracy and ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy . We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing
