Graph Algorithms for Multiparallel Word Alignment
Ayyoob Imani, Masoud Jalili Sabet, L\"utfi Kerem \c{S}enel, Philipp, Dufter, Fran\c{c}ois Yvon, Hinrich Sch\"utze

TL;DR
This paper introduces graph-based algorithms for leveraging multiparallel corpora to improve word alignment accuracy in machine translation, demonstrating significant F1 score improvements over traditional methods.
Contribution
It presents novel graph algorithms inspired by recommender systems and network link prediction to exploit multiparallel data for enhanced word alignment.
Findings
Up to 28% absolute F1 improvement over baseline aligner
Effective utilization of multiparallel corpora for alignment
Demonstrated across multiple datasets
Abstract
With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently. Alignments are useful for typological research, transferring formatting like markup to translated texts, and can be used in the decoding of machine translation systems. At the same time, massively multilingual processing is becoming an important NLP scenario, and pretrained language and machine translation models that are truly multilingual are proposed. However, most alignment algorithms rely on bitexts only and do not leverage the fact that many parallel corpora are multiparallel. In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph and then predicting additional edges in the graph. We present two graph algorithms for edge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
