DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction

Jianing Yang; Wataru Nakata; Yuki Saito; Hiroshi Saruwatari

arXiv:2601.13700·cs.SD·January 21, 2026

DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction

Jianing Yang, Wataru Nakata, Yuki Saito, Hiroshi Saruwatari

PDF

Open Access

TL;DR

DistilMOS introduces a layer-wise self-distillation approach that leverages internal SSL model representations to improve MOS prediction accuracy and generalization, effectively reducing overfitting and catastrophic forgetting.

Contribution

The paper proposes a novel self-distillation method that uses layer-wise token targets from SSL models to enhance MOS prediction performance and robustness.

Findings

01

Significantly outperforms standard SSL-based models on in-domain data.

02

Achieves better generalization on out-of-domain evaluations.

03

Effectively mitigates catastrophic forgetting during fine-tuning.

Abstract

With the advancement of self-supervised learning (SSL), fine-tuning pretrained SSL models for mean opinion score (MOS) prediction has achieved state-of-the-art performance. However, during fine-tuning, these SSL-based MOS prediction models often suffer from catastrophic forgetting of the pretrained knowledge and tend to overfit the training set, resulting in poor generalization performance. In this study, we propose DistilMOS, a novel method that learns to predict not only MOS but also token IDs obtained by clustering the hidden representations of each layer in the pretrained SSL model. These layer-wise token targets serve as self-distillation signals that enables the MOS prediction model to extract rich internal knowledge from SSL models, enhancing both prediction accuracy and generalization capability. Experimental evaluations demonstrate that our method significantly outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Sentiment Analysis and Opinion Mining · Topic Modeling