MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training
Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota, Orihashi, Ryo Masumura

TL;DR
This paper introduces MAPGN, a self-supervised pre-training method for pointer-generator networks that enhances spoken-text normalization, especially when limited paired data is available.
Contribution
The paper proposes a novel self-supervised learning approach, MAPGN, specifically designed for pointer-generator networks to improve spoken-text normalization with limited data.
Findings
MAPGN outperforms conventional self-supervised methods in spoken-text normalization tasks.
Pre-training with MAPGN improves model performance with limited paired data.
The method effectively utilizes unpaired text data for better normalization results.
Abstract
This paper presents a self-supervised learning method for pointer-generator networks to improve spoken-text normalization. Spoken-text normalization that converts spoken-style text into style normalized text is becoming an important technology for improving subsequent processing such as machine translation and summarization. The most successful spoken-text normalization method to date is sequence-to-sequence (seq2seq) mapping using pointer-generator networks that possess a copy mechanism from an input sequence. However, these models require a large amount of paired data of spoken-style text and style normalized text, and it is difficult to prepare such a volume of data. In order to construct spoken-text normalization model from the limited paired data, we focus on self-supervised learning which can utilize unpaired text data to improve seq2seq models. Unfortunately, conventional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
