BAND: Biomedical Alert News Dataset
Zihao Fu, Meiru Zhang, Zaiqiao Meng, Yannan Shen, David Buckeridge,, Nigel Collier

TL;DR
The BAND dataset provides a large, well-annotated collection of biomedical outbreak alerts and news, enabling improved NLP models for epidemiological analysis and disease outbreak understanding.
Contribution
This paper introduces the BAND dataset, the largest annotated corpus of biomedical outbreak news with epidemiology-related questions, facilitating advanced NLP research in disease surveillance.
Findings
Existing models can handle NER, QA, and Event Extraction tasks in epidemiology.
The dataset reveals challenges in content disguise and inference in outbreak news.
Benchmark results highlight areas for future model improvements.
Abstract
Infectious disease outbreaks continue to pose a significant threat to human health and well-being. To improve disease surveillance and understanding of disease spread, several surveillance systems have been developed to monitor daily news alerts and social media. However, existing systems lack thorough epidemiological analysis in relation to corresponding alerts or news, largely due to the scarcity of well-annotated reports data. To address this gap, we introduce the Biomedical Alert News Dataset (BAND), which includes 1,508 samples from existing reported news articles, open emails, and alerts, as well as 30 epidemiology-related questions. These questions necessitate the model's expert reasoning abilities, thereby offering valuable insights into the outbreak of the disease. The BAND dataset brings new challenges to the NLP world, requiring better disguise capability of the content and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data-Driven Disease Surveillance · Biomedical Text Mining and Ontologies
