LibriSpeech-PC: Benchmark for Evaluation of Punctuation and   Capitalization Capabilities of end-to-end ASR Models

Aleksandr Meister; Matvei Novikov; Nikolay Karpov; Evelina Bakhturina,; Vitaly Lavrukhin; Boris Ginsburg

arXiv:2310.02943·cs.CL·October 5, 2023

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina,, Vitaly Lavrukhin, Boris Ginsburg

PDF

Open Access 2 Repos 2 Datasets

TL;DR

This paper introduces LibriSpeech-PC, a benchmark dataset and evaluation metric for assessing the punctuation and capitalization prediction abilities of end-to-end ASR models, addressing current evaluation limitations.

Contribution

It provides a new dataset, a novel evaluation metric, and baseline models for better assessment of punctuation and capitalization in ASR systems.

Findings

01

Benchmark dataset with restored punctuation and capitalization

02

Novel Punctuation Error Rate (PER) metric introduced

03

Baseline models evaluated using the new benchmark

Abstract

Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format. Simultaneously, the development of end-to-end ASR models capable of predicting punctuation and capitalization presents several challenges, primarily due to limited data availability and shortcomings in the existing evaluation methods, such as inadequate assessment of punctuation prediction. In this paper, we introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models. The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems