Musical Mix Clarity Predication using Decomposition and Perceptual Masking Thresholds
Andrew Parker, Steven Fenton

TL;DR
This paper introduces a perceptual model that decomposes music signals into components, calculates masking thresholds, and predicts perceived mix clarity with high correlation to subjective scores, aiding music mixing and retrieval.
Contribution
The work presents a novel perceptual model for mix clarity based on signal decomposition and masking thresholds, validated against subjective ratings.
Findings
Best model variant achieved Spearman's rho = 0.8382
Masking thresholds correlate strongly with subjective clarity scores
Noise-like residuals negatively impact perceived mix clarity
Abstract
Objective measurement of perceptually motivated music attributes has application in both target driven mixing and mastering methodologies and music information retrieval. This work proposes a perceptual model of mix clarity which decomposes a mixed input signal into transient, steady-state, and residual components. Masking thresholds are calculated for each component and their relative relationship is used to determine an overall masking score as the model's output. Three variants of the model were tested against subjective mix clarity scores gathered from a controlled listening test. The best performing variant achieved a Spearman's rank correlation of rho = 0.8382 (p<0.01). Furthermore, the model output was analysed using an independent dataset generated by progressively applying degradation effects to the test stimuli. Analysis of the model suggested a close relationship between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
