MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Yexing Du; Kaiyuan Liu; Bihe Zhang; Youcheng Pan; Bo Yang; Liangyu Huo; Xiyuan Zhang; Jian Xie; Daojing He; Yang Xiang; Ming Liu; Bing Qin

arXiv:2601.09270·cs.CL·April 14, 2026

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Yexing Du, Kaiyuan Liu, Bihe Zhang, Youcheng Pan, Bo Yang, Liangyu Huo, Xiyuan Zhang, Jian Xie, Daojing He, Yang Xiang, Ming Liu, Bing Qin

PDF

1 Repo 1 Datasets

TL;DR

The paper introduces MCGA, a comprehensive 119-hour audio corpus of Classical Chinese literary genres designed to evaluate and advance multimodal large language models in underexplored audio tasks.

Contribution

It presents a new multi-task audio corpus for Classical Chinese literature, evaluates existing models, and proposes domain-specific metrics to improve MLLMs in this niche.

Findings

01

Current MLLMs face significant challenges on MCGA tasks.

02

The corpus enables evaluation across six diverse tasks.

03

Proposed metrics help measure speech-text capability consistency.

Abstract

With the rapid advancement of Multimodal Large Language Models (MLLMs), their potential has gained significant attention in Chinese Classical Studies (CCS). While existing research primarily focuses on text and visual modalities, the audio corpus within this domain remains largely underexplored. To bridge this gap, we introduce the Multi-task Classical Chinese Literary Genre Audio Corpus (MCGA), a 119-hour corpus comprising 22,000 audio samples. It encompasses a diverse range of literary genres across six tasks: Automatic Speech Recognition (ASR), Speech-to-Text Translation (S2TT), Speech Emotion Captioning (SEC), Spoken Question Answering (SQA), Speech Understanding (SU), and Speech Reasoning (SR). Through the evaluation of ten MLLMs, our experimental results demonstrate that current MLLMs still face substantial challenges on the MCGA test set. Furthermore, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yxduir/MCGA
github

Datasets

yxdu/MCGA
dataset· 121 dl
121 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.