Homophone Reveals the Truth: A Reality Check for Speech2Vec

Guangyu Chen

arXiv:2209.10791·cs.CL·September 26, 2022

Homophone Reveals the Truth: A Reality Check for Speech2Vec

Guangyu Chen

PDF

Open Access 1 Repo

TL;DR

This paper critically examines the authenticity of Speech2Vec embeddings using a homophone-based inspection, reproduces the model, and finds it fails to produce effective semantic embeddings, questioning prior claims.

Contribution

It introduces a homophone-based method to verify speech embeddings and reproduces Speech2Vec, revealing its inability to generate meaningful semantic representations.

Findings

01

Homophone inspection suggests the embeddings are not generated by Speech2Vec.

02

Reproduced Speech2Vec fails to learn effective semantic embeddings.

03

Speech2Vec's reported performance is significantly overestimated.

Abstract

Generating spoken word embeddings that possess semantic information is a fascinating topic. Compared with text-based embeddings, they cover both phonetic and semantic characteristics, which can provide richer information and are potentially helpful for improving ASR and speech translation systems. In this paper, we review and examine the authenticity of a seminal work in this field: Speech2Vec. First, a homophone-based inspection method is proposed to check the speech embeddings released by the author of Speech2Vec. There is no indication that these embeddings are generated by the Speech2Vec model. Moreover, through further analysis of the vocabulary composition, we suspect that a text-based model fabricates these embeddings. Finally, we reproduce the Speech2Vec model, referring to the official code and optimal settings in the original paper. Experiments showed that this model failed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

my-yy/s2v_rc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling