GRAM: Spatial general-purpose audio representations for real-world environments
Goksenin Yuksel, Marcel van Gerven, Kiki van der Heijden

TL;DR
GRAM is a novel spatial audio model that learns representations from multi-channel recordings, excelling in real-world environments and tasks like sound localization, outperforming existing models with less training data.
Contribution
The paper introduces GRAM, a multi-channel masked autoencoder for spatial audio, and provides standardized benchmarks demonstrating its superior performance in real-world acoustic tasks.
Findings
Outperforms state-of-the-art models on NatHEAR and HEAR benchmarks.
Achieves high localization accuracy in simulated environments.
Generalizes effectively to real-world recordings in RealSELD.
Abstract
Audio foundation models learn general-purpose audio representations that facilitate a wide range of downstream tasks. While the performance of these models has greatly increased for conventional single-channel, dry audio clips, their success in real-world acoustic environments with reverberation and noise is limited. Furthermore, most audio foundation models ignore the spatial dimension of real-world acoustic environments, ruling out tasks involving sound localization. To address these limitations, we propose GRAM: a general-purpose real-world audio model that employs a multi-channel masked autoencoder to efficiently learn spatial audio representations. We evaluated GRAM and other audio foundation models in a standardized manner on high-quality simulations of naturalistic, spatial acoustic environments as well as recordings of real-world environments and release these two complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
