On Neural Phone Recognition of Mixed-Source ECoG Signals
Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards,, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. Chang

TL;DR
This paper demonstrates that neural speech recognition using electrocorticography can effectively identify speech sources in noisy environments, outperforming traditional automatic speech recognition in mixed-source scenarios.
Contribution
It introduces an improved NSR framework with manual alignment initialization and mismatch correction, advancing neural speech recognition in complex acoustic settings.
Findings
NSR performance degradation is lower than ASR in mixed-source scenarios.
Manual alignment significantly improves NSR accuracy.
Accounting for transcription mismatch enhances system robustness.
Abstract
The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings. In this study, we demonstrate the utility of NSR systems to objectively prove the ability of human beings to attend to a single speech source while suppressing the interfering signals in a simulated cocktail party scenario. The experimental results show that the relative degradation of the NSR system performance when tested in a mixed-source scenario is significantly lower than that of automatic speech recognition (ASR). In this paper, we have significantly enhanced the performance of our recently published framework by using manual alignments for initialization instead of the flat start technique. We have also improved the NSR system performance by accounting for the possible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques
