LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models
Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina,, Vitaly Lavrukhin, Boris Ginsburg

TL;DR
This paper introduces LibriSpeech-PC, a benchmark dataset and evaluation metric for assessing the punctuation and capitalization prediction abilities of end-to-end ASR models, addressing current evaluation limitations.
Contribution
It provides a new dataset, a novel evaluation metric, and baseline models for better assessment of punctuation and capitalization in ASR systems.
Findings
Benchmark dataset with restored punctuation and capitalization
Novel Punctuation Error Rate (PER) metric introduced
Baseline models evaluated using the new benchmark
Abstract
Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format. Simultaneously, the development of end-to-end ASR models capable of predicting punctuation and capitalization presents several challenges, primarily due to limited data availability and shortcomings in the existing evaluation methods, such as inadequate assessment of punctuation prediction. In this paper, we introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models. The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
