In Silico Modeling of the RAMPHO Buffer: Dissociating Informational and Energetic Masking via Phonetic Entropy in Deep Neural Networks
Stefan Bleeck

TL;DR
This paper models the cognitive aspects of multi-talker listening using phonetic entropy in neural networks, distinguishing informational from energetic masking effects.
Contribution
It introduces an in silico simulation of the RAMPHO buffer with phonetic entropy, revealing cognitive-acoustic trade-offs in speech masking.
Findings
Semantic distractor removal reduces informational masking at high SNRs.
Phonetic entropy effectively dissociates informational and energetic masking.
Simulation highlights a Pareto optimization between semantic content and temporal cues.
Abstract
The fundamental challenge of listening in multi-talker environments is a cognitive bottleneck, defined by the Ease of Language Understanding (ELU) model as a failure within the RAMPHO episodic buffer. Current deep neural networks for speech enhancement optimize purely for physical acoustics, failing to account for the cognitive penalty of informational masking. Here, we present an in silico simulation of the RAMPHO buffer using the frame-by-frame phonetic entropy of a self-supervised acoustic model (wav2vec 2.0). By contrasting a semantically intact distractor with a phase-decorrelated distractor (the Concentration Shield) across a signal-to-noise ratio (SNR) sweep, we successfully dissociate the cognitive penalty of informational distraction from the physical penalty of energetic decay. The simulation reveals a cognitive-acoustic Pareto optimization problem: destroying a distractor's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
