TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks
Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke, Rundensteiner

TL;DR
TWEET-FID is the first publicly available annotated Twitter dataset designed for multiple foodborne illness detection tasks, enabling the development of machine learning models to identify outbreaks from social media data.
Contribution
The paper introduces TWEET-FID, a novel annotated dataset for foodborne illness detection on Twitter, along with methodology and baseline results for related NLP tasks.
Findings
State-of-the-art models achieve promising results on TWEET-FID
Dataset supports multiple detection tasks including relevance, entity, and slot filling
Provides a foundation for future research in outbreak detection from social media
Abstract
Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗auro736/roberta-large-tweet-fid-TRCmodel· 4 dl4 dl
- 🤗auro736/xlm-roberta-large-tweet-fid-TRCmodel· 2 dl2 dl
- 🤗auro736/deberta-v3-large-tweet-fid-TRCmodel· 4 dl4 dl
- 🤗auro736/roberta-large-tweet-fid-EMDmodel· 2 dl2 dl
- 🤗auro736/xlm-roberta-large-tweet-fid-EMDmodel· 3 dl3 dl
- 🤗auro736/deberta-v3-large-tweet-fid-EMDmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Sentiment Analysis and Opinion Mining · Text and Document Classification Technologies
