ACE-2005-PT: Corpus for Event Extraction in Portuguese
Lu\'is Filipe Cunha, Purifica\c{c}\~ao Silvano, Ricardo Campos,, Al\'ipio Jorge

TL;DR
This paper presents ACE-2005-PT, a Portuguese version of the ACE-2005 corpus for event extraction, created through automatic translation and alignment techniques, enabling broader NLP research in Portuguese.
Contribution
The paper introduces a novel pipeline for translating and aligning ACE-2005 into Portuguese, including multiple alignment techniques and evaluation methods.
Findings
Achieved 70.55% exact match accuracy in alignment
Achieved 87.55% relaxed match accuracy
Successfully generated a Portuguese ACE-2005 corpus accepted by LDC
Abstract
Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
