Learning Perceptually Relevant Temporal Envelope Morphing
Satvik Dixit, Sungjoon Park, Chris Donahue, Laurie M. Heller

TL;DR
This paper introduces a perceptually guided machine learning approach to temporal envelope morphing in audio, enabling more natural sound blending by learning from human listening studies and outperforming existing methods.
Contribution
It presents a novel workflow combining perceptual principles, large-scale datasets, and machine learning to improve audio envelope morphing quality.
Findings
Outperforms existing methods in producing natural intermediate morphs
Derived perceptual principles from human listening studies
Developed benchmarks for evaluating envelope morphing
Abstract
Temporal envelope morphing, the process of interpolating between the amplitude dynamics of two audio signals, is an emerging problem in generative audio systems that lacks sufficient perceptual grounding. Morphing of temporal envelopes in a perceptually intuitive manner should enable new methods for sound blending in creative media and for probing perceptual organization in psychoacoustics. However, existing audio morphing techniques often fail to produce intermediate temporal envelopes when input sounds have distinct temporal structures; many morphers effectively overlay both temporal structures, leading to perceptually unnatural results. In this paper, we introduce a novel workflow for learning envelope morphing with perceptual guidance: we first derive perceptually grounded morphing principles through human listening studies, then synthesize large-scale datasets encoding these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Neuroscience and Music Perception · Music and Audio Processing
