The Biased Journey of MSD_AUDIO.ZIP
Haven Kim, Keunwoo Choi, Mateusz Modrzejewski, Cynthia C. S. Liem

TL;DR
This paper examines the access barriers and inequities faced by researchers trying to obtain the MSD_AUDIO.ZIP dataset, highlighting issues of data misreporting, API discontinuation, and restricted access within the MIR community.
Contribution
It provides an in-depth qualitative analysis of access challenges to the MSD_AUDIO.ZIP dataset based on interviews, emphasizing the need for more equitable data sharing practices.
Findings
Access to MSD_AUDIO.ZIP is restricted and uneven.
Misreporting and API discontinuation hinder data availability.
The MIR community needs to address data access inequalities.
Abstract
The equitable distribution of academic data is crucial for ensuring equal research opportunities, and ultimately further progress. Yet, due to the complexity of using the API for audio data that corresponds to the Million Song Dataset along with its misreporting (before 2016) and the discontinuation of this API (after 2016), access to this data has become restricted to those within certain affiliations that are connected peer-to-peer. In this paper, we delve into this issue, drawing insights from the experiences of 22 individuals who either attempted to access the data or played a role in its creation. With this, we hope to initiate more critical dialogue and more thoughtful consideration with regard to access privilege in the MIR community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Musicological Studies · Music and Audio Processing · Speech Recognition and Synthesis
