On Neural Phone Recognition of Mixed-Source ECoG Signals

Ahmed Hussen Abdelaziz; Shuo-Yiin Chang; Nelson Morgan; Erik Edwards,; Dorothea Kolossa; Dan Ellis; David A. Moses; Edward F. Chang

arXiv:1912.05869·eess.AS·December 13, 2019

On Neural Phone Recognition of Mixed-Source ECoG Signals

Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards,, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. Chang

PDF

Open Access

TL;DR

This paper demonstrates that neural speech recognition using electrocorticography can effectively identify speech sources in noisy environments, outperforming traditional automatic speech recognition in mixed-source scenarios.

Contribution

It introduces an improved NSR framework with manual alignment initialization and mismatch correction, advancing neural speech recognition in complex acoustic settings.

Findings

01

NSR performance degradation is lower than ASR in mixed-source scenarios.

02

Manual alignment significantly improves NSR accuracy.

03

Accounting for transcription mismatch enhances system robustness.

Abstract

The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings. In this study, we demonstrate the utility of NSR systems to objectively prove the ability of human beings to attend to a single speech source while suppressing the interfering signals in a simulated cocktail party scenario. The experimental results show that the relative degradation of the NSR system performance when tested in a mixed-source scenario is significantly lower than that of automatic speech recognition (ASR). In this paper, we have significantly enhanced the performance of our recently published framework by using manual alignments for initialization instead of the flat start technique. We have also improved the NSR system performance by accounting for the possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques