SciTweets -- A Dataset and Annotation Framework for Detecting Scientific Online Discourse
Salim Hafid, Sebastian Schellhammer, Sandra Bringay, Konstantin, Todorov, Stefan Dietze

TL;DR
This paper introduces a new annotated dataset and framework for identifying and classifying scientific discourse in tweets, enabling better analysis of online scientific discussions across disciplines.
Contribution
It provides an annotation framework, a labeled dataset of 1261 tweets, and a multi-label classifier for detecting scientific relatedness and specific scientific content in social media.
Findings
Achieved 89% F1 score in detecting science-related tweets
Expert annotation with Fleiss Kappa of 0.63 demonstrates annotation reliability
Classifier distinguishes claims, references, and general science-relatedness
Abstract
Scientific topics, claims and resources are increasingly debated as part of online discourse, where prominent examples include discourse related to COVID-19 or climate change. This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines. For instance, communication studies aim at a deeper understanding of biases, quality or spreading pattern of scientific information whereas computational methods have been proposed to extract, classify or verify scientific claims using NLP and IR techniques. However, research across disciplines currently suffers from both a lack of robust definitions of the various forms of science-relatedness as well as appropriate ground truth data for distinguishing them. In this work, we contribute (a) an annotation framework and corresponding definitions for different forms of scientific relatedness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining
