MambaRate: Speech Quality Assessment Across Different Sampling Rates

Panos Kakoulidis; Iakovi Alexiou; Junkwang Oh; Gunu Jho; Inchul Hwang; Pirros Tsiakoulis; Aimilios Chalamandaris

arXiv:2507.12090·cs.SD·July 17, 2025

MambaRate: Speech Quality Assessment Across Different Sampling Rates

Panos Kakoulidis, Iakovi Alexiou, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris

PDF

Open Access

TL;DR

MambaRate is a novel speech quality assessment model that predicts MOS across different sampling rates using self-supervised embeddings and Gaussian RBF encoding, achieving competitive results in the AudioMOS Challenge 2025.

Contribution

It introduces a new approach combining self-supervised embeddings and RBF encoding for sampling rate-invariant MOS prediction, with strong initial results and improvements over baseline models.

Findings

01

Initial T16 system outperformed baseline by ~14% in few-shot setting.

02

T16 ranked fourth in AudioMOS Challenge 2025, close to the top system.

03

Additional experiments improved performance on BVCC dataset.

Abstract

We propose MambaRate, which predicts Mean Opinion Scores (MOS) with limited bias regarding the sampling rate of the waveform under evaluation. It is designed for Track 3 of the AudioMOS Challenge 2025, which focuses on predicting MOS for speech in high sampling frequencies. Our model leverages self-supervised embeddings and selective state space modeling. The target ratings are encoded in a continuous representation via Gaussian radial basis functions (RBF). The results of the challenge were based on the system-level Spearman's Rank Correllation Coefficient (SRCC) metric. An initial MambaRate version (T16 system) outperformed the pre-trained baseline (B03) by ~14% in a few-shot setting without pre-training. T16 ranked fourth out of five in the challenge, differing by ~6% from the winning system. We present additional results on the BVCC dataset as well as ablations with different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing