FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Sara Papi; Marco Gaido; Luisa Bentivogli; Alessio Brutti; Mauro Cettolo; Roberto Gretter; Marco Matassoni; Mohamed Nabih; Matteo Negri

arXiv:2505.22759·cs.CL·June 3, 2025

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Sara Papi, Marco Gaido, Luisa Bentivogli, Alessio Brutti, Mauro Cettolo, Roberto Gretter, Marco Matassoni, Mohamed Nabih, Matteo Negri

PDF

Open Access 1 Repo 4 Models 1 Datasets

TL;DR

FAMA is the first open-science speech foundation model for English and Italian, trained on extensive open-source data, achieving competitive performance and faster processing, with all resources openly available.

Contribution

It introduces FAMA, the first fully open-source speech foundation models for English and Italian, with a large dataset and transparent artifacts to promote open science in speech technology.

Findings

01

FAMA achieves performance comparable to existing models.

02

FAMA is up to 8 times faster than comparable models.

03

All resources are openly released under OS licenses.

Abstract

The development of speech foundation models (SFMs) like Whisper and SeamlessM4T has significantly advanced the field of speech processing. However, their closed nature--with inaccessible training data and code--poses major reproducibility and fair evaluation challenges. While other domains have made substantial progress toward open science by developing fully transparent models trained on open-source (OS) code and data, similar efforts in speech remain limited. To fill this gap, we introduce FAMA, the first family of open science SFMs for English and Italian, trained on 150k+ hours of OS speech data. Moreover, we present a new dataset containing 16k hours of cleaned and pseudo-labeled speech for both languages. Results show that FAMA achieves competitive performance compared to existing SFMs while being up to 8 times faster. All artifacts, including code, datasets, and models, are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hlt-mt/fbk-fairseq
pytorchOfficial

Models

Datasets

FBK-MT/fama-data
dataset· 63 dl
63 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques