QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

Subrata Biswas; Mohammad Nur Hossain Khan; Bashima Islam

arXiv:2505.14723·eess.AS·August 18, 2025

QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam

PDF

1 Repo

TL;DR

QUADS is a unified framework that jointly optimizes quantization and distillation for efficient speech understanding, achieving high accuracy with significantly reduced model size and computational complexity.

Contribution

It introduces a multi-stage training method that combines quantization and distillation, improving efficiency and accuracy in resource-constrained SLU systems.

Findings

01

Achieves 71.13% accuracy on SLURP with minimal degradation.

02

Reduces model size by up to 700 times and computation by 73 times.

03

Maintains robustness under extreme quantization.

Abstract

Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13\% accuracy on SLURP and 99.20\% on FSC, with only minor degradations of up to 5.56\% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73 $\times$ (GMACs) and model size by 83--700 $\times$ , demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bashlab/quads
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.