Simultaneously Learning Speaker's Direction and Head Orientation from   Binaural Recordings

Harshvardhan Takawale; Nirupam Roy

arXiv:2309.15064·eess.AS·September 27, 2023

Simultaneously Learning Speaker's Direction and Head Orientation from Binaural Recordings

Harshvardhan Takawale, Nirupam Roy

PDF

Open Access

TL;DR

This paper introduces a neural network system that jointly estimates speaker and listener head orientations from binaural recordings, leveraging voice directivity and HRTF cues for applications in AR/VR and smart headphones.

Contribution

It presents a novel CNN-based approach for simultaneous prediction of speaker and listener orientations using ear-mounted microphone data.

Findings

01

Accurate joint orientation predictions demonstrated in experiments.

02

Utilizes frequency-dependent voice directivity and HRTF effects.

03

Applicable to real-world earable device scenarios.

Abstract

Estimation of a speaker's direction and head orientation with binaural recordings can be a critical piece of information in many real-world applications with emerging `earable' devices, including smart headphones and AR/VR headsets. However, it requires predicting the mutual head orientations of both the speaker and the listener, which is challenging in practice. This paper presents a system for jointly predicting speaker-listener head orientations by leveraging inherent human voice directivity and listener's head-related transfer function (HRTF) as perceived by the ear-mounted microphones on the listener. We propose a convolution neural network model that, given binaural speech recording, can predict the orientation of both speaker and listener with respect to the line joining the two. The system builds on the core observation that the recordings from the left and right ears are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Hearing Loss and Rehabilitation

MethodsConvolution