Hierarchical Phrase-based Sequence-to-Sequence Learning
Bailin Wang, Ivan Titov, Jacob Andreas, Yoon Kim

TL;DR
This paper introduces a hierarchical phrase-based neural transducer that combines a discriminative parser with a seq2seq model, improving translation quality by explicitly modeling phrase hierarchies during training and inference.
Contribution
It presents a novel hierarchical phrase-based neural transducer with two inference modes, integrating a bracketing transduction grammar with seq2seq models for improved translation.
Findings
Both inference modes outperform baselines on small machine translation benchmarks.
The model effectively incorporates hierarchical phrase structures into neural translation.
Decoding with the CKY algorithm enables flexible use of translation rules during inference.
Abstract
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative parser based on a bracketing transduction grammar whose derivation tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one. We use the same seq2seq model to translate at all phrase scales, which results in two inference modes: one mode in which the parser is discarded and only the seq2seq component is used at the sequence-level, and another in which the parser is combined with the seq2seq model. Decoding in the latter mode is done with the cube-pruned CKY algorithm, which is more involved but can make use of new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence · Variational Inference
