EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic   Tweets

Maram Hasanain; Reem Suwaileh; Tamer Elsayed; Mucahid Kutlu; Hind; Almerekhi

arXiv:1708.05517·cs.IR·August 22, 2017

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

Maram Hasanain, Reem Suwaileh, Tamer Elsayed, Mucahid Kutlu, Hind, Almerekhi

PDF

TL;DR

EveTAR is a large, multi-task, Arabic tweet test collection created around significant events, enabling diverse IR research without shared-task campaigns, and demonstrating high-quality annotations and reliable system evaluation.

Contribution

The paper introduces EveTAR, the first comprehensive Arabic tweet test collection supporting multiple IR tasks, developed through a novel, language-independent methodology around significant events.

Findings

01

EveTAR contains 62K annotated tweets over 50 events.

02

High inter-annotator agreement (Kappa 0.71) indicates annotation quality.

03

Existing IR algorithms show reliable performance on EveTAR, comparable to TREC collections.

Abstract

This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.