
TL;DR
This paper introduces a novel motion spatio-temporal sampling reconstruction theory that enables efficient, physically accurate simulation of moving sound sources, significantly improving dynamic training data for speech enhancement systems.
Contribution
It proposes a new theory and hierarchical sampling strategy for realistic, real-time simulation of moving sound sources, surpassing traditional static methods.
Findings
More accurate amplitude and phase restoration in moving scenarios
Enables high-quality dynamic training data for speech enhancement
Real-time simulation with reduced computational complexity
Abstract
Modern neural network-based speech processing systems usually need to have reverberation resistance, so the training of such systems requires a large amount of reverberation data. In the process of system training, it is now more inclined to use sampling static systems to simulate dynamic systems, or to supplement data through actually recorded data. However, this cannot fundamentally solve the problem of simulating motion data that conforms to physical laws. Aiming at the core issue of insufficient training data for speech enhancement models in moving scenarios, this paper proposes Yang's motion spatio-temporal sampling reconstruction theory to realize efficient simulation of motion continuous time-varying reverberation. This theory breaks through the limitations of the traditional static Image-Source Method (ISM) in time-varying systems. By decomposing the impulse response of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
