TL;DR
This paper introduces a time-frequency scattering model combined with metric learning to accurately capture auditory similarities in instrumental techniques, surpassing previous methods in modeling timbre perception.
Contribution
It proposes a novel machine listening approach using joint time-frequency scattering features and LMNN metric learning to model complex timbre similarities across instruments and techniques.
Findings
Achieves 99.0% AP@5 in similarity retrieval
Outperforms existing models in modeling timbre perception
Ablation shows importance of scattering features and metric learning
Abstract
Instrumental playing techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human subjects to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time--frequency scattering features to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTriplet Loss
