The second multi-channel multi-party meeting transcription challenge   (M2MeT) 2.0): A benchmark for speaker-attributed ASR

Yuhao Liang; Mohan Shi; Fan Yu; Yangze Li; Shiliang Zhang; Zhihao Du,; Qian Chen; Lei Xie; Yanmin Qian; Jian Wu; Zhuo Chen; Kong Aik Lee; Zhijie; Yan; Hui Bu

arXiv:2309.13573·cs.SD·October 6, 2023·1 cites

The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du,, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie, Yan, Hui Bu

PDF

Open Access

TL;DR

The paper presents M2MeT 2.0, a benchmark challenge for speaker-attributed automatic speech recognition in meetings, with two sub-tracks and a new test set, to evaluate current systems' ability to identify who spoke what and when.

Contribution

It introduces a new benchmark for speaker-attributed ASR with diverse sub-tracks and a new test set, advancing evaluation of meeting transcription systems.

Findings

01

Baseline systems show room for improvement in speaker attribution accuracy.

02

Open training conditions lead to better performance with more data.

03

Benchmark results highlight current challenges in speaker-attributed ASR.

Abstract

With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tracks. The fixed training condition sub-track, where the training data is constrained to predetermined datasets, but participants can use any open-source pre-trained model. The open training condition sub-track, which allows for the use of all available data and models without limitation. In addition, we release a new 10-hour test set for challenge ranking. This paper provides an overview of the dataset, track settings, results, and analysis of submitted systems, as a benchmark to show the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling