A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling   Framework

Zheng Nan; Ting Dang; Vidhyasaharan Sethu; Beena Ahmed

arXiv:2409.15357·eess.AS·September 25, 2024

A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework

Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed

PDF

Open Access

TL;DR

This paper introduces a novel spectro-temporal relational thinking framework for acoustic modeling in speech recognition, leveraging probabilistic graphs to improve phoneme recognition accuracy, especially for vowels.

Contribution

It presents a new framework that models speech relations across time and frequency domains using probabilistic graphs, enhancing recognition performance.

Findings

01

Achieved 7.82% improvement in phoneme recognition on TIMIT.

02

Model particularly improves vowel recognition accuracy.

03

Relational modeling captures complex speech dependencies.

Abstract

Relational thinking refers to the inherent ability of humans to form mental impressions about relations between sensory signals and prior knowledge, and subsequently incorporate them into their model of their world. Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems. Recently, there have been some attempts to correct this oversight, but these have been limited to coarse utterance-level models that operate exclusively in the time domain. In an attempt to narrow the gap between artificial systems and human abilities, this paper presents a novel spectro-temporal relational thinking based acoustic modeling framework. Specifically, it first generates numerous probabilistic graphs to model the relationships among speech segments across both time and frequency domains. The relational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems