Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots

Gyeong-Tae Lee; Hyeonuk Nam; and Yong-Hwa Park

arXiv:2507.20530·eess.AS·July 29, 2025

Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots

Gyeong-Tae Lee, Hyeonuk Nam, and Yong-Hwa Park

PDF

Open Access

TL;DR

This paper presents a binaural sound event localization and detection system inspired by human spatial hearing, utilizing a new feature representation and a synthetic dataset to improve accuracy in realistic auditory scenes.

Contribution

It introduces the BiSELD task, a novel BTFF feature representation, and a CRNN-based model, along with a synthetic benchmark dataset for binaural sound localization and detection.

Findings

01

Achieved 87.1% F-score in sound event detection

02

Attained 4.4° average localization error

03

Demonstrated effectiveness of BTFF features in improving performance

Abstract

This paper introduces Binaural Sound Event Localization and Detection (BiSELD), a task that aims to jointly detect and localize multiple sound events using binaural audio, inspired by the spatial hearing mechanism of humans. To support this task, we present a synthetic benchmark dataset, called the Binaural Set, which simulates realistic auditory scenes using measured head-related transfer functions (HRTFs) and diverse sound events. To effectively address the BiSELD task, we propose a new input feature representation called the Binaural Time-Frequency Feature (BTFF), which encodes interaural time difference (ITD), interaural level difference (ILD), and high-frequency spectral cues (SC) from binaural signals. BTFF is composed of eight channels, including left and right mel-spectrograms, velocity-maps, SC-maps, and ITD-/ILD-maps, designed to cover different spatial cues across frequency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation