Speech Recognition Transformers: Topological-lingualism Perspective

Shruti Singh; Muskaan Singh; Virender Kadyan

arXiv:2408.14991·cs.CL·August 28, 2024

Speech Recognition Transformers: Topological-lingualism Perspective

Shruti Singh, Muskaan Singh, Virender Kadyan

PDF

Open Access

TL;DR

This paper surveys transformer-based speech recognition methods, emphasizing a topological-lingualism perspective, covering models, datasets, architectures, and open challenges in multilingual speech processing.

Contribution

It provides a comprehensive overview of speech transformers from a topological-lingualism perspective, highlighting recent advances and future research directions.

Findings

01

Transformers significantly improve speech recognition accuracy.

02

Multilingual and cross-lingual models enable better resource sharing.

03

Open challenges include dataset diversity and model scalability.

Abstract

Transformers have evolved with great success in various artificial intelligence tasks. Thanks to our recent prevalence of self-attention mechanisms, which capture long-term dependency, phenomenal outcomes in speech processing and recognition tasks have been produced. The paper presents a comprehensive survey of transformer techniques oriented in speech modality. The main contents of this survey include (1) background of traditional ASR, end-to-end transformer ecosystem, and speech transformers (2) foundational models in a speech via lingualism paradigm, i.e., monolingual, bilingual, multilingual, and cross-lingual (3) dataset and languages, acoustic features, architecture, decoding, and evaluation metric from a specific topological lingualism perspective (4) popular speech transformer toolkit for building end-to-end ASR systems. Finally, highlight the discussion of open challenges and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Robotics and Automated Systems