Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features
Jin Woo Lee, Sungho Lee, Kyogu Lee

TL;DR
This paper introduces a deep learning approach that uses learned affine transformations of hyper-conditioned features to interpolate and generalize HRTFs across different spatial datasets, improving efficiency and accuracy in binaural audio rendering.
Contribution
A novel deep learning architecture that enhances HRTF interpolation and generalization across datasets with varying spatial sampling distributions.
Findings
Improved cross-dataset HRTF interpolation accuracy.
Robust reconstruction of target HRTFs from sparse data.
Enhanced model generalizability across coordinate systems.
Abstract
Estimating Head-Related Transfer Functions (HRTFs) of arbitrary source points is essential in immersive binaural audio rendering. Computing each individual's HRTFs is challenging, as traditional approaches require expensive time and computational resources, while modern data-driven approaches are data-hungry. Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampling distributions of source positions, posing a major problem when generalizing the method across multiple datasets. To alleviate this, we propose a deep learning method based on a novel conditioning architecture. The proposed method can predict an HRTF of any position by interpolating the HRTFs of known distributions. Experimental results show that the proposed architecture improves the model's generalizability across datasets with various coordinate systems. Additional demonstrations show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Advanced Vision and Imaging
