Slamming: Training a Speech Language Model on One GPU in a Day

Gallil Maimon; Avishai Elmakies; Yossi Adi

arXiv:2502.15814·cs.LG·May 23, 2025

Slamming: Training a Speech Language Model on One GPU in a Day

Gallil Maimon, Avishai Elmakies, Yossi Adi

PDF

Open Access 1 Repo 3 Models 2 Datasets 1 Video

TL;DR

This paper presents Slam, a practical recipe for training high-quality Speech Language Models on a single GPU within 24 hours, making SLM research more accessible and scalable.

Contribution

It introduces a comprehensive training recipe for SLMs that achieves competitive performance with minimal compute and time, outperforming existing scaling law predictions.

Findings

01

SLAM achieves high-quality SLMs on a single GPU in 24 hours.

02

The recipe scales well with increased compute, matching leading models.

03

Results surpass predicted compute optimal performance, indicating high feasibility.

Abstract

We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slp-rl/slamkit
pytorchOfficial

Models

Datasets

Videos

Slamming: Training a Speech Language Model on One GPU in a Day· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling