Visually Exploring Multi-Purpose Audio Data
David Heise, Helen L. Bear

TL;DR
This paper uses visualisation techniques to analyze multi-purpose audio data, revealing natural clusters and explaining classifier performance limitations, which informs future development of more effective audio classification systems.
Contribution
It introduces the use of VAT for visualising natural data groupings in audio, providing insights into classifier confusions and limitations.
Findings
VAT reveals natural clusters aligning with known labels
Explains classifier confusions observed in prior work
Highlights importance of data structure in classifier performance
Abstract
We analyse multi-purpose audio using tools to visualise similarities within the data that may be observed via unsupervised methods. The success of machine learning classifiers is affected by the information contained within system inputs, so we investigate whether latent patterns within the data may explain performance limitations of such classifiers. We use the visual assessment of cluster tendency (VAT) technique on a well known data set to observe how the samples naturally cluster, and we make comparisons to the labels used for audio geotagging and acoustic scene classification. We demonstrate that VAT helps to explain and corroborate confusions observed in prior work to classify this audio, yielding greater insight into the performance - and limitations - of supervised classification systems. While this exploratory analysis is conducted on data for which we know the "ground truth"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
