Using external sources of bilingual information for on-the-fly word alignment
Miquel Espl\`a-Gomis, Felipe S\'anchez-Mart\'inez, Mikel L. Forcada

TL;DR
This paper introduces a simple, language-independent word alignment method using external bilingual sources like machine translation, achieving comparable results to state-of-the-art tools with minimal training data and domain independence.
Contribution
The paper presents a novel, parameter-efficient, domain-independent word alignment approach leveraging external bilingual information sources, trained on small datasets.
Findings
Achieves precision comparable to GIZA++ with minimal training data.
Requires only small, domain-independent training corpus.
Performs well 'on the fly' on new sentence pairs.
Abstract
In this paper we present a new and simple language-independent method for word-alignment based on the use of external sources of bilingual information such as machine translation systems. We show that the few parameters of the aligner can be trained on a very small corpus, which leads to results comparable to those obtained by the state-of-the-art tool GIZA++ in terms of precision. Regarding other metrics, such as alignment error rate or F-measure, the parametric aligner, when trained on a very small gold-standard (450 pairs of sentences), provides results comparable to those produced by GIZA++ when trained on an in-domain corpus of around 10,000 pairs of sentences. Furthermore, the results obtained indicate that the training is domain-independent, which enables the use of the trained aligner 'on the fly' on any new pair of sentences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
