Towards Perception-Informed Latent HRTF Representations

You Zhang; Andrew Francl; Ruohan Gao; Paul Calamia; Zhiyao Duan; Ishwarya Ananthabhotla

arXiv:2507.02815·eess.AS·January 27, 2026

Towards Perception-Informed Latent HRTF Representations

You Zhang, Andrew Francl, Ruohan Gao, Paul Calamia, Zhiyao Duan, Ishwarya Ananthabhotla

PDF

TL;DR

This paper introduces a perception-informed latent space for HRTF representations, improving personalization by aligning spectral features with perceptual relevance to enhance spatial audio experiences.

Contribution

The work proposes a novel metric-based loss and MMDS supervision to embed HRTFs into a perceptually aligned latent space, advancing personalization accuracy.

Findings

01

Traditional HRTF representations poorly correlate with perceptual metrics.

02

Perception-informed embeddings improve HRTF personalization.

03

Enhanced spatial audio experiences through perceptually aligned representations.

Abstract

Personalized head-related transfer functions (HRTFs) are essential for ensuring a realistic auditory experience over headphones, because they take into account individual anatomical differences that affect listening. Most machine learning approaches to HRTF personalization rely on a learned low-dimensional latent space to generate or select custom HRTFs for a listener. However, these latent representations are typically learned in a manner that optimizes for spectral reconstruction but not for perceptual compatibility, meaning they may not necessarily align with perceptual distance. In this work, we first study whether traditionally learned HRTF representations are well correlated with perceptual relations using auditory-based objective perceptual metrics; we then propose a method for explicitly embedding HRTFs into a perception-informed latent space, leveraging a metric-based loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.