Perceptual-Neural-Physical Sound Matching
Han Han, Vincent Lostanlen, Mathieu Lagrange

TL;DR
This paper introduces PNP, a novel loss function combining perceptual and physical modeling insights, enabling more effective neural sound matching, especially for complex nonstationary sounds like percussion.
Contribution
It proposes PNP, an efficient quadratic approximation of spectral loss that improves neural sound matching by integrating physical modeling and joint time-frequency scattering transform.
Findings
PNP outperforms traditional loss functions in matching synthetic drum sounds.
PNP achieves faster convergence comparable to P-loss while capturing perceptually relevant features.
The method demonstrates potential for complex nonstationary sound matching.
Abstract
Sound matching algorithms seek to approximate a target waveform by parametric audio synthesis. Deep neural networks have achieved promising results in matching sustained harmonic tones. However, the task is more challenging when targets are nonstationary and inharmonic, e.g., percussion. We attribute this problem to the inadequacy of loss function. On one hand, mean square error in the parametric domain, known as "P-loss", is simple and fast but fails to accommodate the differing perceptual significance of each parameter. On the other hand, mean square error in the spectrotemporal domain, known as "spectral loss", is perceptually motivated and serves in differentiable digital signal processing (DDSP). Yet, spectral loss is a poor predictor of pitch intervals and its gradient may be computationally expensive; hence a slow convergence. Against this conundrum, we present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
