Evaluating Subtitle Segmentation for End-to-end Generation Systems
Alina Karakanta, Fran\c{c}ois Buet, Mauro Cettolo and, Fran\c{c}ois Yvon

TL;DR
This paper evaluates subtitle segmentation methods, introduces a new metric called Sigma for assessing segmentation accuracy independently of text quality, and analyzes how different metrics influence system ranking.
Contribution
The paper systematically analyzes existing segmentation metrics and proposes Sigma, a novel metric that isolates segmentation quality from text content in subtitle generation.
Findings
Sigma effectively separates segmentation quality from text quality.
All metrics can reward high-quality outputs, but system rankings vary.
Sigma shows promise but needs further validation against human judgments.
Abstract
Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard segmentation metrics cannot be applied when systems generate outputs different than the reference, e.g. with end-to-end subtitling systems. In this paper, we study ways to conduct reference-based evaluations of segmentation accuracy irrespective of the textual content. We first conduct a systematic analysis of existing metrics for evaluating subtitle segmentation. We then introduce , a new Subtitle Segmentation Score derived from an approximate upper-bound of BLEU on segmentation boundaries, which allows us to disentangle the effect of good segmentation from text quality. To compare with existing metrics, we further propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Translation Studies and Practices · Video Analysis and Summarization
