Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Steffen Eger, Johannes Daxenberger, Christian Stab, Iryna, Gurevych

TL;DR
This paper demonstrates that simple machine translation combined with annotation projection effectively enables cross-lingual argumentation mining, outperforming other transfer methods and working well with both human and machine translations.
Contribution
It introduces a new parallel corpus for cross-lingual AM and compares transfer strategies, showing annotation projection with machine translation is highly effective.
Findings
Annotation projection outperforms bilingual embeddings in cross-lingual AM.
Machine translation-based projection performs nearly as well as human translation.
The approach is cost-effective and adaptable across multiple languages.
Abstract
Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
MethodsAttention Model
