SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment
Abdullatif K\"oksal, Silvia Severini, Hinrich Sch\"utze

TL;DR
SilverAlign is a novel method that automatically generates silver data for evaluating word aligners using machine translation and minimal pairs, effectively addressing the lack of gold data especially in low-resource languages.
Contribution
It introduces a new automatic approach to create evaluation data for word aligners, enabling assessment without gold standards across multiple languages.
Findings
Performance on silver data correlates well with gold benchmarks.
Effective for 9 language pairs.
Useful for low-resource language evaluation.
Abstract
Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different domains and languages when gold data are not available. This addresses the important scenario of missing gold data alignments for low-resource languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
