Claim Detection in Biomedical Twitter Posts
Amelie W\"uhrl, Roman Klinger

TL;DR
This paper introduces a new annotated corpus of biomedical tweets to automatically detect claims, revealing high claim density and demonstrating the challenges of identifying both explicit and implicit claims using baseline models.
Contribution
It provides the first annotated dataset of biomedical social media claims and evaluates baseline models for claim detection in this domain.
Findings
45% of biomedical tweets contain claims
Explicit claim detection is more accurate than implicit
Baseline models show moderate performance on claim detection
Abstract
Social media contains unfiltered and unique information, which is potentially of great value, but, in the case of misinformation, can also do great harm. With regards to biomedical topics, false information can be particularly dangerous. Methods of automatic fact-checking and fake news detection address this problem, but have not been applied to the biomedical domain in social media yet. We aim to fill this research gap and annotate a corpus of 1200 tweets for implicit and explicit biomedical claims (the latter also with span annotations for the claim phrase). With this corpus, which we sample to be related to COVID-19, measles, cystic fibrosis, and depression, we develop baseline models which detect tweets that contain a claim automatically. Our analyses reveal that biomedical tweets are densely populated with claims (45 % in a corpus sampled to contain 1200 tweets focused on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
