The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models

Katsuhiko Yamamoto; Koichi Miyazaki; Shogo Seki

arXiv:2512.05592·cs.SD·February 3, 2026

The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models

Katsuhiko Yamamoto, Koichi Miyazaki, Shogo Seki

PDF

Open Access

TL;DR

The T12 system for AudioMOS Challenge 2025 introduces a novel ensemble of KAN- and VERSA-based models for predicting audio aesthetics scores, achieving top correlation results across multiple evaluation axes.

Contribution

It presents a new ensemble approach combining KAN-based and VERSA-based models for improved audio aesthetics score prediction.

Findings

01

Achieved the highest correlation scores among submissions.

02

Ensemble model improved prediction accuracy.

03

Released inference models for practical use.

Abstract

We propose an audio aesthetics score (AES) prediction system by CyberAgent (AESCA) for AudioMOS Challenge 2025 (AMC25) Track 2. The AESCA comprises a Kolmogorov--Arnold Network (KAN)-based audiobox aesthetics and a predictor from the metric scores using the VERSA toolkit. In the KAN-based predictor, we replaced each multi-layer perceptron layer in the baseline model with a group-rational KAN and trained the model with labeled and pseudo-labeled audio samples. The VERSA-based predictor was designed as a regression model using extreme gradient boosting, incorporating outputs from existing metrics. Both the KAN- and VERSA-based models predicted the AES, including the four evaluation axes. The final AES values were calculated using an ensemble model that combined four KAN-based models and a VERSA-based model. Our proposed T12 system yielded the best correlations among the submitted systems,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Aesthetic Perception and Analysis