Discrete representations in neural models of spoken language
Bertrand Higy, Lieke Gelderloos, Afra Alishahi, Grzegorz, Chrupa{\l}a

TL;DR
This paper evaluates how different metrics assess discrete neural representations in spoken language models, revealing inconsistencies and limitations in current methods and their correlation with linguistic units.
Contribution
It systematically compares four metrics for analyzing vector-quantized spoken language models and discusses their effectiveness and limitations.
Findings
Different metrics yield inconsistent evaluation results.
Minimal pair stimuli disadvantage larger discrete inventories.
Vector quantization moderately correlates with linguistic units.
Abstract
The distributed and continuous representations used by neural networks are at odds with representations employed in linguistics, which are typically symbolic. Vector quantization has been proposed as a way to induce discrete neural representations that are closer in nature to their linguistic counterparts. However, it is not clear which metrics are the best-suited to analyze such discrete representations. We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language. We compare the results they show when applied to two different models, while systematically studying the effect of the placement and size of the discretization layer. We find that different evaluation regimes can give inconsistent results. While we can attribute them to the properties of the different metrics in most cases, one point of concern remains: the use of minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Speech Recognition and Synthesis · Topic Modeling
