Garbage, Glitter, or Gold: Assigning Multi-dimensional Quality Scores to Social Media Seeds for Web Archive Collections
Alexander C. Nwala, Michele C. Weigle, Michael L. Nelson

TL;DR
This paper introduces a multi-dimensional scoring framework called Quality Proxies for assessing social media seeds used in web archiving, improving seed selection by considering credibility, relevance, and other factors.
Contribution
The paper presents the QP framework that assigns comprehensive quality scores to social media seeds, integrating social media credibility with web archive considerations.
Findings
Seeds scored with QP show increased precision (~0.13) in quality selection.
The framework is extensible and supports multiple ranking policies.
Quality scores are explainable and applicable across domains.
Abstract
From popular uprisings to pandemics, the Web is an essential source consulted by scientists and historians for reconstructing and studying past events. Unfortunately, the Web is plagued by reference rot which causes important Web resources to disappear. Web archive collections help reduce the costly effects of reference rot by saving Web resources that chronicle important stories/events before they disappear. These collections often begin with URLs called seeds, hand-selected by experts or scraped from social media. The quality of social media content varies widely, therefore, we propose a framework for assigning multi-dimensional quality scores to social media seeds for Web archive collections about stories and events. We leveraged contributions from social media research for attributing quality to social media content and users based on credibility, reputation, and influence. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Misinformation and Its Impacts · Topic Modeling
