On Barriers to Archival Audio Processing

Peter Sullivan; Muhammad Abdul-Mageed

arXiv:2507.08768·cs.SD·July 14, 2025

On Barriers to Archival Audio Processing

Peter Sullivan, Muhammad Abdul-Mageed

PDF

TL;DR

This paper evaluates the robustness of modern speech processing tools on archival mid-20th century radio recordings, highlighting strengths in language identification but vulnerabilities in speaker recognition due to biases.

Contribution

It provides an empirical assessment of current LID and SR methods on historical recordings, revealing their capabilities and limitations in archival contexts.

Findings

01

LID systems like Whisper handle multilingual and accented speech well

02

Speaker embeddings are sensitive to channel, age, and language biases

03

Archival SR methods need improvement for reliable speaker indexing

Abstract

In this study, we leverage a unique UNESCO collection of mid-20th century radio recordings to probe the robustness of modern off-the-shelf language identification (LID) and speaker recognition (SR) methods, especially with respect to the impact of multilingual speakers and cross-age recordings. Our findings suggest that LID systems, such as Whisper, are increasingly adept at handling second-language and accented speech. However, speaker embeddings remain a fragile component of speech processing pipelines that is prone to biases related to the channel, age, and language. Issues which will need to be overcome should archives aim to employ SR methods for speaker indexing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.