Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment
Hao Wang, Yves Lepage

TL;DR
This paper introduces a fast, hierarchical sub-sentential alignment method based on BTG-forest, which improves phrase table size and translation quality, especially for distant language pairs, while maintaining efficiency.
Contribution
It presents a novel BTG-forest-based alignment approach with a fast unsupervised initialization, achieving comparable speed to fast_align but with smaller phrase tables and better performance on distant languages.
Findings
Achieves similar runtime to fast_align
Produces smaller phrase tables
Outperforms in English-Japanese translation
Abstract
In this paper, we propose a novel BTG-forest-based alignment method. Based on a fast unsupervised initialization of parameters using variational IBM models, we synchronously parse parallel sentences top-down and align hierarchically under the constraint of BTG. Our two-step method can achieve the same run-time and comparable translation performance as fast_align while it yields smaller phrase tables. Final SMT results show that our method even outperforms in the experiment of distantly related languages, e.g., English-Japanese.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
