Graph Algorithms for Multiparallel Word Alignment

Ayyoob Imani; Masoud Jalili Sabet; L\"utfi Kerem \c{S}enel; Philipp; Dufter; Fran\c{c}ois Yvon; Hinrich Sch\"utze

arXiv:2109.06283·cs.CL·September 15, 2021

Graph Algorithms for Multiparallel Word Alignment

Ayyoob Imani, Masoud Jalili Sabet, L\"utfi Kerem \c{S}enel, Philipp, Dufter, Fran\c{c}ois Yvon, Hinrich Sch\"utze

PDF

Open Access 1 Repo

TL;DR

This paper introduces graph-based algorithms for leveraging multiparallel corpora to improve word alignment accuracy in machine translation, demonstrating significant F1 score improvements over traditional methods.

Contribution

It presents novel graph algorithms inspired by recommender systems and network link prediction to exploit multiparallel data for enhanced word alignment.

Findings

01

Up to 28% absolute F1 improvement over baseline aligner

02

Effective utilization of multiparallel corpora for alignment

03

Demonstrated across multiple datasets

Abstract

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently. Alignments are useful for typological research, transferring formatting like markup to translated texts, and can be used in the decoding of machine translation systems. At the same time, massively multilingual processing is becoming an important NLP scenario, and pretrained language and machine translation models that are truly multilingual are proposed. However, most alignment algorithms rely on bitexts only and do not leverage the fact that many parallel corpora are multiparallel. In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph and then predicting additional edges in the graph. We present two graph algorithms for edge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cisnlp/graph-align
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification