Building the National Radio Recordings Database: A Big Data Approach to Documenting Audio Heritage
Emily Goodmann, Mark A. Matienzo, Shawn VanCour, William Vanden Dries

TL;DR
This paper discusses the development of a comprehensive, publicly accessible database of radio recordings by the US Library of Congress, highlighting technical, organizational, and ethical challenges encountered in large-scale audio heritage preservation.
Contribution
It presents a detailed account of building a large-scale, searchable radio recordings database, emphasizing the social and institutional complexities involved.
Findings
Aggregated metadata on 2,500 radio collections.
Identified key logistical and ethical challenges.
Provided lessons for future big data projects.
Abstract
This paper traces strategies used by the Radio Preservation Task Force of the Library of Congress's National Recording Preservation Board to develop a publicly searchable database documenting extant radio materials held by collecting institutions throughout the country. Having aggregated metadata on 2,500 unique collections to date, the project has encountered a series of logistical challenges that are not only technical in nature but also institutional and social, raising critical issues involving organizational structure, political representation, and the ethics of data access. As the project continues to expand and evolve, lessons from its early development offer valuable reminders of the human judgment, hidden labor, and interpersonal relations required for successful big data work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
