Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment
Nway Nway Han, Aye Thida

TL;DR
This paper develops guidelines and creates a reference corpus for Myanmar-English word alignment, enabling evaluation of alignment methods where none previously existed, using verified annotations and systematic instructions.
Contribution
It introduces the first reference corpus for Myanmar-English word alignment with annotated guidelines and verified alignments, facilitating future research and evaluation.
Findings
High annotator agreement demonstrated by low AER.
Systematic guidelines reduce alignment ambiguities.
Correlation between BLEU scores and alignment quality.
Abstract
Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar-English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines for Myanmar-English word alignment annotation between two languages over contrastive learning and built the Myanmar-English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus contains confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual words. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
