One Whisper to Grade Them All

Nhan Phan; Anusha Porwal; Yaroslav Getman; Ekaterina Voskoboinik; Tam\'as Gr\'osz; Mikko Kurimo

arXiv:2507.17918·cs.CL·October 7, 2025

One Whisper to Grade Them All

Nhan Phan, Anusha Porwal, Yaroslav Getman, Ekaterina Voskoboinik, Tam\'as Gr\'osz, Mikko Kurimo

PDF

TL;DR

This paper introduces an efficient end-to-end system for holistic automatic speaking assessment that processes multiple responses with a single Whisper encoder, outperforming text-based baselines and demonstrating high data efficiency.

Contribution

The novel architecture processes all test parts simultaneously with a single Whisper encoder and a lightweight aggregator, eliminating transcription and per-part models for scalable language assessment.

Findings

01

Achieved RMSE of 0.384, outperforming the baseline of 0.44.

02

Reduced training data requirement by 55.2%, maintaining high performance.

03

System is efficient with at most 168M parameters, enabling large-scale deployment.

Abstract

We present an efficient end-to-end approach for holistic Automatic Speaking Assessment (ASA) of multi-part second-language tests, developed for the 2025 Speak & Improve Challenge. Our system's main novelty is the ability to process all four spoken responses with a single Whisper-small encoder, combine all information via a lightweight aggregator, and predict the final score. This architecture removes the need for transcription and per-part models, cuts inference time, and makes ASA practical for large-scale Computer-Assisted Language Learning systems. Our system achieved a Root Mean Squared Error (RMSE) of 0.384, outperforming the text-based baseline (0.44) while using at most 168M parameters (about 70% of Whisper-small). Furthermore, we propose a data sampling strategy, allowing the model to train on only 44.8% of the speakers in the corpus and still reach 0.383 RMSE, demonstrating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.