Validity in Music Information Research Experiments
Bob L. T. Sturm, Arthur Flexer

TL;DR
This paper emphasizes the importance of considering validity in music information research experiments to improve scientific rigor and better understand phenomena like adversarial attacks and performance limits.
Contribution
It reviews the concept of validity, applies it to MIR experiments, and offers guidance for making valid inferences from experimental data.
Findings
Highlights the importance of validity in MIR research
Analyzes four major types of validity in the context of MIR
Provides guidance for conducting valid MIR experiments
Abstract
Validity is the truth of an inference made from evidence, such as data collected in an experiment, and is central to working scientifically. Given the maturity of the domain of music information research (MIR), validity in our opinion should be discussed and considered much more than it has been so far. Considering validity in one's work can improve its scientific and engineering value. Puzzling MIR phenomena like adversarial attacks and performance glass ceilings become less mysterious through the lens of validity. In this article, we review the subject of validity in general, considering the four major types of validity from a key reference: Shadish et al. 2002. We ground our discussion of these types with a prototypical MIR experiment: music classification using machine learning. Through this MIR experimentalists can be guided to make valid inferences from data collected from their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
