Hierarchical Modeling of Spatial Cues via Spherical Harmonics for   Multi-Channel Speech Enhancement

Jiahui Pan; Shulin He; Hui Zhang; Xueliang Zhang

arXiv:2309.10393·cs.SD·September 20, 2023

Hierarchical Modeling of Spatial Cues via Spherical Harmonics for Multi-Channel Speech Enhancement

Jiahui Pan, Shulin He, Hui Zhang, Xueliang Zhang

PDF

Open Access

TL;DR

This paper introduces a hierarchical spherical harmonic transform-based approach for multi-channel speech enhancement, explicitly modeling spatial cues to improve performance with fewer parameters.

Contribution

It presents a novel hierarchical framework that explicitly incorporates spatial modeling using spherical harmonics, enhancing multi-channel speech enhancement.

Findings

01

Effective recovery of target spatial patterns

02

Improved performance over baseline models

03

Fewer parameters and computations needed

Abstract

Multi-channel speech enhancement utilizes spatial information from multiple microphones to extract the target speech. However, most existing methods do not explicitly model spatial cues, instead relying on implicit learning from multi-channel spectra. To better leverage spatial information, we propose explicitly incorporating spatial modeling by applying spherical harmonic transforms (SHT) to the multi-channel input. In detail, a hierarchical framework is introduced whereby lower order harmonics capturing broader spatial patterns are estimated first, then combined with higher orders to recursively predict finer spatial details. Experiments on TIMIT demonstrate the proposed method can effectively recover target spatial patterns and achieve improved performance over baseline models, using fewer parameters and computations. Explicitly modeling spatial information hierarchically enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation