Misophonia Trigger Sound Detection on Synthetic Soundscapes Using a Hybrid Model with a Frozen Pre-Trained CNN and a Time-Series Module
Kurumi Sashida, Gouhei Tanaka

TL;DR
This paper develops a hybrid deep learning model combining a frozen pre-trained CNN and various time-series modules to detect misophonia trigger sounds in synthetic soundscapes, aiming to aid assistive technologies.
Contribution
It introduces a novel hybrid model architecture for trigger sound detection using synthetic data and compares different temporal modules, highlighting lightweight options like BiESN for personalization.
Findings
Bidirectional temporal models improve detection accuracy.
BiGRU achieves the best overall performance.
BiESN offers competitive results with fewer trainable parameters.
Abstract
Misophonia is a disorder characterized by a decreased tolerance to specific everyday sounds (trigger sounds) that can evoke intense negative emotional responses such as anger, panic, or anxiety. These reactions can substantially impair daily functioning and quality of life. Assistive technologies that selectively detect trigger sounds could help reduce distress and improve well-being. In this study, we investigate sound event detection (SED) to localize intervals of trigger sounds in continuous environmental audio as a foundational step toward such assistive support. Motivated by the scarcity of real-world misophonia data, we generate synthetic soundscapes tailored to misophonia trigger sound detection using audio synthesis techniques. Then, we perform trigger sound detection tasks using hybrid CNN-based models. The models combine feature extraction using a frozen pre-trained CNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Voice and Speech Disorders
