MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines

D\'avid Javorsk\'y; Ond\v{r}ej Bojar; Fran\c{c}ois Yvon

arXiv:2506.04848·cs.CL·June 6, 2025

MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines

D\'avid Javorsk\'y, Ond\v{r}ej Bojar, Fran\c{c}ois Yvon

PDF

Open Access 1 Repo 1 Video

TL;DR

MockConf is a new dataset and toolset for analyzing simultaneous interpreting, featuring multilingual recordings, alignments, and evaluation metrics to advance automatic interpretation analysis.

Contribution

The paper introduces MockConf, a novel multilingual dataset with span- and word-level alignments, and InterAlign, a web-based annotation tool for interpreting research.

Findings

01

Dataset includes 7 hours of multilingual interpreting recordings.

02

Introduces InterAlign, a tool for detailed annotation of interpreting data.

03

Provides baseline alignment metrics and evaluation methods.

Abstract

In simultaneous interpreting, an interpreter renders a source speech into another language with a very short lag, much sooner than sentences are finished. In order to understand and later reproduce this dynamic and complex task automatically, we need dedicated datasets and tools for analysis, monitoring, and evaluation, such as parallel speech corpora, and tools for their automatic annotation. Existing parallel corpora of translated texts and associated alignment algorithms hardly fill this gap, as they fail to model long-range interactions between speech segments or specific types of divergences (e.g., shortening, simplification, functional generalization) between the original and interpreted speeches. In this work, we introduce MockConf, a student interpreting dataset that was collected from Mock Conferences run as part of the students' curriculum. This dataset contains 7 hours of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

J4VORSKY/MockConf
pytorchOfficial

Videos

MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines· underline

Taxonomy

TopicsText Readability and Simplification · Interpreting and Communication in Healthcare · Natural Language Processing Techniques