Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments

Wessel Ledder; Yuzhen Qin; Kiki van der Heijden

arXiv:2409.10048·cs.SD·June 24, 2025

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments

Wessel Ledder, Yuzhen Qin, Kiki van der Heijden

PDF

Open Access 1 Repo

TL;DR

This paper introduces an audio-driven deep reinforcement learning framework for head-orientation in naturalistic environments, demonstrating high performance in anechoic conditions and analyzing generalization across reverberant settings.

Contribution

It presents a novel DRL approach for head-orientation control using stereo speech, highlighting its performance and generalization capabilities in reverberant environments.

Findings

01

High accuracy in anechoic environments

02

Performance drops with reverberation but remains better than baseline

03

Generalization varies depending on training environment

Abstract

Although deep reinforcement learning (DRL) approaches in audio signal processing have seen substantial progress in recent years, audio-driven DRL for tasks such as navigation, gaze control and head-orientation control in the context of human-robot interaction have received little attention. Here, we propose an audio-driven DRL framework in which we utilise deep Q-learning to develop an autonomous agent that orients towards a talker in the acoustic environment based on stereo speech recordings. Our results show that the agent learned to perform the task at a near perfect level when trained on speech segments in anechoic environments (that is, without reverberation). The presence of reverberation in naturalistic acoustic environments affected the agent's performance, although the agent still substantially outperformed a baseline, randomly acting agent. Finally, we quantified the degree of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

humanandmachinehearing/audiodriven_drl_for_headorientationcontrol
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTactile and Sensory Interactions · Multisensory perception and integration · Color perception and design

MethodsQ-Learning