Simultaneously Learning Speaker's Direction and Head Orientation from Binaural Recordings
Harshvardhan Takawale, Nirupam Roy

TL;DR
This paper introduces a neural network system that jointly estimates speaker and listener head orientations from binaural recordings, leveraging voice directivity and HRTF cues for applications in AR/VR and smart headphones.
Contribution
It presents a novel CNN-based approach for simultaneous prediction of speaker and listener orientations using ear-mounted microphone data.
Findings
Accurate joint orientation predictions demonstrated in experiments.
Utilizes frequency-dependent voice directivity and HRTF effects.
Applicable to real-world earable device scenarios.
Abstract
Estimation of a speaker's direction and head orientation with binaural recordings can be a critical piece of information in many real-world applications with emerging `earable' devices, including smart headphones and AR/VR headsets. However, it requires predicting the mutual head orientations of both the speaker and the listener, which is challenging in practice. This paper presents a system for jointly predicting speaker-listener head orientations by leveraging inherent human voice directivity and listener's head-related transfer function (HRTF) as perceived by the ear-mounted microphones on the listener. We propose a convolution neural network model that, given binaural speech recording, can predict the orientation of both speaker and listener with respect to the line joining the two. The system builds on the core observation that the recordings from the left and right ears are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Hearing Loss and Rehabilitation
MethodsConvolution
