A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework
Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed

TL;DR
This paper introduces a novel spectro-temporal relational thinking framework for acoustic modeling in speech recognition, leveraging probabilistic graphs to improve phoneme recognition accuracy, especially for vowels.
Contribution
It presents a new framework that models speech relations across time and frequency domains using probabilistic graphs, enhancing recognition performance.
Findings
Achieved 7.82% improvement in phoneme recognition on TIMIT.
Model particularly improves vowel recognition accuracy.
Relational modeling captures complex speech dependencies.
Abstract
Relational thinking refers to the inherent ability of humans to form mental impressions about relations between sensory signals and prior knowledge, and subsequently incorporate them into their model of their world. Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems. Recently, there have been some attempts to correct this oversight, but these have been limited to coarse utterance-level models that operate exclusively in the time domain. In an attempt to narrow the gap between artificial systems and human abilities, this paper presents a novel spectro-temporal relational thinking based acoustic modeling framework. Specifically, it first generates numerous probabilistic graphs to model the relationships among speech segments across both time and frequency domains. The relational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
