EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
Maram Hasanain, Reem Suwaileh, Tamer Elsayed, Mucahid Kutlu, Hind, Almerekhi

TL;DR
EveTAR is a large, multi-task, Arabic tweet test collection created around significant events, enabling diverse IR research without shared-task campaigns, and demonstrating high-quality annotations and reliable system evaluation.
Contribution
The paper introduces EveTAR, the first comprehensive Arabic tweet test collection supporting multiple IR tasks, developed through a novel, language-independent methodology around significant events.
Findings
EveTAR contains 62K annotated tweets over 50 events.
High inter-annotator agreement (Kappa 0.71) indicates annotation quality.
Existing IR algorithms show reliable performance on EveTAR, comparable to TREC collections.
Abstract
This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
