AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio
Xiaoyang Huang, Yanjun Wang, Yang Liu, Bingbing Ni, Wenjun Zhang,, Jinxian Liu, Teng Li

TL;DR
This paper introduces a novel approach for personalized spatial audio by reconstructing 3D ear shapes from single-view images using new datasets and a depth-guided reconstruction method, enabling more accurate sound localization.
Contribution
The work presents AudioEar3D and AudioEar2D datasets, a depth-guided ear reconstruction model, and a pipeline integrating reconstructed ears with 3D body models for personalized HRTF simulation.
Findings
AudioEar3D and AudioEar2D are the largest high-quality datasets for ear reconstruction.
The proposed AudioEarM method achieves accurate 3D ear reconstruction from single images.
The pipeline enables personalized spatial audio rendering with improved accuracy.
Abstract
Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of individuals, which is essential to produce accurate sound source positions. In this work, we address this problem from an interdisciplinary perspective. The rendering of spatial audio is strongly correlated with the 3D shape of human bodies, particularly ears. To this end, we propose to achieve personalized spatial audio by reconstructing 3D human ears with single-view images. First, to benchmark the ear reconstruction task, we introduce AudioEar3D, a high-quality 3D ear dataset consisting of 112 point cloud ear scans with RGB images. To self-supervisedly train a reconstruction model, we further collect a 2D ear dataset composed of 2,000 images, each one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Reconstructive Facial Surgery Techniques
