Misophonia Trigger Sound Detection on Synthetic Soundscapes Using a Hybrid Model with a Frozen Pre-Trained CNN and a Time-Series Module

Kurumi Sashida; Gouhei Tanaka

arXiv:2602.06271·cs.SD·February 9, 2026

Misophonia Trigger Sound Detection on Synthetic Soundscapes Using a Hybrid Model with a Frozen Pre-Trained CNN and a Time-Series Module

Kurumi Sashida, Gouhei Tanaka

PDF

Open Access

TL;DR

This paper develops a hybrid deep learning model combining a frozen pre-trained CNN and various time-series modules to detect misophonia trigger sounds in synthetic soundscapes, aiming to aid assistive technologies.

Contribution

It introduces a novel hybrid model architecture for trigger sound detection using synthetic data and compares different temporal modules, highlighting lightweight options like BiESN for personalization.

Findings

01

Bidirectional temporal models improve detection accuracy.

02

BiGRU achieves the best overall performance.

03

BiESN offers competitive results with fewer trainable parameters.

Abstract

Misophonia is a disorder characterized by a decreased tolerance to specific everyday sounds (trigger sounds) that can evoke intense negative emotional responses such as anger, panic, or anxiety. These reactions can substantially impair daily functioning and quality of life. Assistive technologies that selectively detect trigger sounds could help reduce distress and improve well-being. In this study, we investigate sound event detection (SED) to localize intervals of trigger sounds in continuous environmental audio as a foundational step toward such assistive support. Motivated by the scarcity of real-world misophonia data, we generate synthetic soundscapes tailored to misophonia trigger sound detection using audio synthesis techniques. Then, we perform trigger sound detection tasks using hybrid CNN-based models. The models combine feature extraction using a frozen pre-trained CNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Neuroscience and Music Perception · Voice and Speech Disorders