Computer Audition: From Task-Specific Machine Learning to Foundation Models

Andreas Triantafyllopoulos; Iosif Tsangko; Alexander Gebhard; Annamaria Mesaros; Tuomas Virtanen; Bj\"orn Schuller

arXiv:2407.15672·cs.SD·July 29, 2025·3 cites

Computer Audition: From Task-Specific Machine Learning to Foundation Models

Andreas Triantafyllopoulos, Iosif Tsangko, Alexander Gebhard, Annamaria Mesaros, Tuomas Virtanen, Bj\"orn Schuller

PDF

Open Access

TL;DR

This paper reviews the shift from traditional task-specific audio analysis methods to the development of general-purpose foundation models in computer audition, emphasizing their advantages and potential to unify multiple audio tasks.

Contribution

It provides an overview of how foundation models are transforming computational audio analysis and highlights key principles enabling multi-task capabilities.

Findings

01

Foundation models unify multiple audio tasks.

02

They leverage cross-modal knowledge.

03

They facilitate human interaction with audio systems.

Abstract

Foundation models (FMs) are increasingly spearheading recent advances on a variety of tasks that fall under the purview of computer audition -- the use of machines to understand sounds. They feature several advantages over traditional pipelines: among others, the ability to consolidate multiple tasks in a single model, the option to leverage knowledge from other modalities, and the readily-available interaction with human users. Naturally, these promises have created substantial excitement in the audio community, and have led to a wave of early attempts to build new, general-purpose foundation models for audio. In the present contribution, we give an overview of computational audio analysis as it transitions from traditional pipelines towards auditory foundation models. Our work highlights the key operating principles that underpin those models, and showcases how they can accommodate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications