CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction
Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

TL;DR
CrudeOilNews is a newly created, high-quality annotated corpus of English crude oil news articles designed for event extraction, supporting economic and financial text mining research.
Contribution
This paper introduces the first annotated crude oil news corpus, detailing its creation, annotation methodology, and initial use in training event extraction models.
Findings
High annotation consistency and quality
Effective data augmentation and active learning methods
Preliminary models demonstrate usefulness for machine labeling
Abstract
In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Machine Learning in Materials Science
