Punctuation Prediction Model for Conversational Speech
Piotr \.Zelasko, Piotr Szyma\'nski, Jan Mizgajski, Adrian Szymczak,, Yishay Carmiel, Najim Dehak

TL;DR
This paper develops neural network models to predict punctuation in conversational speech transcripts, improving readability and NLP processing by leveraging time-aligned data and pre-trained embeddings.
Contribution
It introduces two neural network variants trained on Fisher corpus with novel sequence alignment, demonstrating improved punctuation prediction accuracy in speech transcripts.
Findings
CNN models achieve higher precision, especially for question marks
BLSTM models have better recall and fewer overall mistakes
Using time-aligned data and pre-trained embeddings enhances prediction accuracy
Abstract
An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader andoff-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation. The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs tend to have better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGloVe Embeddings
