A dataset and model for auditory scene recognition for hearing devices: AHEAD-DS and OpenYAMNet
Henry Zhong, J\"org M. Buchholz, Julian Maclaren, Simon Carlile, and Richard Lyon

TL;DR
This paper introduces AHEAD-DS, a standardized dataset for auditory scene recognition tailored for hearing devices, and OpenYAMNet, a lightweight model capable of real-time scene recognition on edge devices like smartphones.
Contribution
It provides a publicly accessible dataset with relevant labels and a deployable sound recognition model optimized for resource-constrained hearing aid applications.
Findings
OpenYAMNet achieved 0.86 mean average precision and 0.93 accuracy.
The model demonstrated real-time scene recognition on a Google Pixel 3.
Latency was approximately 50ms for model loading and 30ms per second of audio.
Abstract
Scene recognition is important for hearing devices, however; this is challenging, in part because of the limitations of existing datasets. Datasets often lack public accessibility, completeness, or audiologically relevant labels, hindering systematic comparison of machine learning models. Deploying such models on resource-constrained edge devices presents another challenge.The proposed solution is two-fold, a repack and refinement of several open source datasets to create AHEAD-DS, a dataset designed for auditory scene recognition for hearing devices, and introduce OpenYAMNet, a sound recognition model. AHEAD-DS aims to provide a standardised, publicly available dataset with consistent labels relevant to hearing aids, facilitating model comparison. OpenYAMNet is designed for deployment on edge devices like smartphones connected to hearing devices, such as hearing aids and wireless…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
