# SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

**Authors:** Yuan Ge, Junxiang Zhang, Xiaoqian Liu, Bei Li, Xiangnan Ma, Chenglong Wang, Kaiyang Ye, Yangfan Du, Linfeng Zhang, Yuxin Huang, Tong Xiao, Zhengtao Yu, JingBo Zhu

arXiv: 2508.20916 · 2025-11-11

## TL;DR

SageLM is an innovative speech LLM that jointly evaluates semantic and acoustic aspects, uses rationale supervision for explainability, and employs a two-stage training process with synthetic data, achieving high agreement with human judgments.

## Contribution

It introduces SageLM, a multi-aspect, explainable speech LLM with a novel evaluation framework and training paradigm, surpassing existing baselines in speech model assessment.

## Key findings

- Achieves 82.79% agreement with human evaluators.
- Outperforms cascaded baselines by at least 7.42%.
- Outperforms SLM-based baselines by at least 26.20%.

## Abstract

Speech-to-Speech (S2S) Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling end-to-end spoken dialogue systems. However, evaluating these models remains a fundamental challenge. We propose \texttt{SageLM}, an end-to-end, multi-aspect, and explainable speech LLM for comprehensive S2S LLMs evaluation. First, unlike cascaded approaches that disregard acoustic features, SageLM jointly assesses both semantic and acoustic dimensions. Second, it leverages rationale-based supervision to enhance explainability and guide model learning, achieving superior alignment with evaluation outcomes compared to rule-based reinforcement learning methods. Third, we introduce \textit{SpeechFeedback}, a synthetic preference dataset, and employ a two-stage training paradigm to mitigate the scarcity of speech preference data. Trained on both semantic and acoustic dimensions, SageLM achieves an 82.79\% agreement rate with human evaluators, outperforming cascaded and SLM-based baselines by at least 7.42\% and 26.20\%, respectively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20916/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20916/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/2508.20916/full.md

---
Source: https://tomesphere.com/paper/2508.20916