OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe

TL;DR
OpenBEATs introduces an open-source, multi-domain audio pre-training framework using masked token prediction, achieving state-of-the-art results across diverse audio understanding tasks and datasets, thereby advancing general-purpose audio representation learning.
Contribution
It extends BEATs with multi-domain pre-training and open-source code, enabling broader application and reproducibility in general audio understanding.
Findings
State-of-the-art performance on bioacoustics and environmental sound datasets.
Effective multi-domain pre-training improves general audio representations.
Models outperform larger models at a fraction of the parameters.
Abstract
Masked token prediction has emerged as a powerful pre-training objective across language, vision, and speech, offering the potential to unify these diverse modalities through a single pre-training task. However, its application for general audio understanding remains underexplored, with BEATs being the only notable example. BEATs has seen limited modifications due to the absence of open-source pre-training code. Furthermore, BEATs was trained only on AudioSet, restricting its broader downstream applicability. To address these gaps, we present OpenBEATs, an open-source framework that extends BEATs via multi-domain audio pre-training. We conduct comprehensive evaluations across six types of tasks, twenty five datasets, and three audio domains, including audio reasoning tasks such as audio question answering, entailment, and captioning. OpenBEATs achieves state-of-the-art performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗shikhar7ssu/OpenBEATS-Large-i1-esc50f1model· 1 dl1 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i1-esc50f2model· 2 dl2 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i1-esc50f3model· 3 dl3 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i1-esc50f4model· 2 dl2 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i1-esc50f5model
- 🤗shikhar7ssu/OpenBEATS-Large-i2-esc50f1model· 1 dl1 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i2-esc50f2model· 1 dl1 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i2-esc50f3model· 1 dl1 dl
- 🤗shikhar7ssu/OpenBEATS-Large-i2-esc50f4model
- 🤗shikhar7ssu/OpenBEATS-Large-i2-esc50f5model· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
