Pseudo-Bidirectional Decoding for Local Sequence Transduction
Wangchunshu Zhou, Tao Ge, Ke Xu

TL;DR
This paper introduces Pseudo-Bidirectional Decoding, a novel approach for local sequence transduction tasks that enhances seq2seq models by providing right-side context, reducing parameters, and improving performance on benchmarks.
Contribution
The paper proposes Pseudo-Bidirectional Decoding, enabling bi-directional context in seq2seq models for LST tasks, sharing encoder-decoder, and reducing parameters while improving accuracy.
Findings
Consistent performance improvements on benchmark datasets.
Reduces model parameters by half with effective regularization.
Enhances context modeling for local sequence transduction tasks.
Abstract
Local sequence transduction (LST) tasks are sequence transduction tasks where there exists massive overlapping between the source and target sequences, such as Grammatical Error Correction (GEC) and spell or OCR correction. Previous work generally tackles LST tasks with standard sequence-to-sequence (seq2seq) models that generate output tokens from left to right and suffer from the issue of unbalanced outputs. Motivated by the characteristic of LST tasks, in this paper, we propose a simple but versatile approach named Pseudo-Bidirectional Decoding (PBD) for LST tasks. PBD copies the corresponding representation of source tokens to the decoder as pseudo future context to enable the decoder to attends to its bi-directional context. In addition, the bidirectional decoding scheme and the characteristic of LST tasks motivate us to share the encoder and the decoder of seq2seq models. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
