Homophone Reveals the Truth: A Reality Check for Speech2Vec
Guangyu Chen

TL;DR
This paper critically examines the authenticity of Speech2Vec embeddings using a homophone-based inspection, reproduces the model, and finds it fails to produce effective semantic embeddings, questioning prior claims.
Contribution
It introduces a homophone-based method to verify speech embeddings and reproduces Speech2Vec, revealing its inability to generate meaningful semantic representations.
Findings
Homophone inspection suggests the embeddings are not generated by Speech2Vec.
Reproduced Speech2Vec fails to learn effective semantic embeddings.
Speech2Vec's reported performance is significantly overestimated.
Abstract
Generating spoken word embeddings that possess semantic information is a fascinating topic. Compared with text-based embeddings, they cover both phonetic and semantic characteristics, which can provide richer information and are potentially helpful for improving ASR and speech translation systems. In this paper, we review and examine the authenticity of a seminal work in this field: Speech2Vec. First, a homophone-based inspection method is proposed to check the speech embeddings released by the author of Speech2Vec. There is no indication that these embeddings are generated by the Speech2Vec model. Moreover, through further analysis of the vocabulary composition, we suspect that a text-based model fabricates these embeddings. Finally, we reproduce the Speech2Vec model, referring to the official code and optimal settings in the original paper. Experiments showed that this model failed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
