Unified Multimodal Punctuation Restoration Framework for Mixed-Modality   Corpus

Yaoming Zhu; Liwei Wu; Shanbo Cheng; Mingxuan Wang

arXiv:2202.00468·cs.CL·February 2, 2022

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Yaoming Zhu, Liwei Wu, Shanbo Cheng, Mingxuan Wang

PDF

Open Access 1 Repo

TL;DR

The paper introduces UniPunc, a unified multimodal framework that effectively punctuates mixed-modality transcriptions by jointly representing audio and text, outperforming existing models on real-world datasets.

Contribution

UniPunc is the first model to jointly represent audio and text in a shared space for punctuation restoration on mixed-modality data, enabling a single model to handle both types.

Findings

01

Outperforms strong baselines by at least 0.8 F1 score

02

Achieves state-of-the-art results on real-world datasets

03

Enables existing models to punctuate mixed corpus with UniPunc's design

Abstract

The punctuation restoration task aims to correctly punctuate the output transcriptions of automatic speech recognition systems. Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real scenes, where unpunctuated sentences are a mixture of those with and without audio. This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model. UniPunc jointly represents audio and non-audio samples in a shared latent space, based on which the model learns a hybrid representation and punctuates both kinds of samples. We validate the effectiveness of the UniPunc on real-world datasets, which outperforms various strong baselines (e.g. BERT, MuSe) by at least 0.8 overall F1 scores, making a new state-of-the-art. Extensive experiments show that UniPunc's design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaoming95/unipunc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Dropout · Adam · Layer Normalization · Attention Dropout · Weight Decay