TL;DR
CRED-1 is an open, reproducible dataset combining multiple credibility signals for 2,672 domains, aimed at enabling privacy-preserving, client-side misinformation pre-bunking in browsers.
Contribution
This work introduces CRED-1, a novel multi-signal credibility dataset with an open pipeline for privacy-aware misinformation detection.
Findings
Dataset covers 2,672 domains with credibility scores.
Pipeline is fully reproducible using only standard Python libraries.
Dataset and code are publicly available under CC BY 4.0.
Abstract
This article presents CRED-1, an open, reproducible domain-level credibility dataset combining two openly-licensed source lists (OpenSources.co and Iffy.news) with four computed enrichment signals: domain age (WHOIS/RDAP), web popularity (Tranco Top-1M), fact-check frequency (Google Fact Check Tools API), and threat intelligence (Google Safe Browsing API). The dataset covers 2,672 domains categorized as fake, unreliable, mixed, conspiracy, or satire, each assigned a composite credibility score between 0.0 and 1.0. CRED-1 is designed for on-device deployment in privacy-preserving browser extensions to enable client-side pre-bunking of misinformation at the content delivery stage. The entire pipeline is implemented in Python using only standard library modules and is fully reproducible from publicly available sources. The dataset and pipeline code are released under CC~BY~4.0 and archived…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
