Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains
Evan M. Williams, Peter Carragher, Kathleen M. Carley

TL;DR
This paper presents a novel system combining webgraph and social media data to classify website credibility, introduces dredge words as indicators of unreliable sites, and achieves state-of-the-art results in detection accuracy.
Contribution
It introduces the concept of dredge words, integrates webgraph and social media contexts using graph neural networks, and provides a new dataset for studying unreliable domains.
Findings
State-of-the-art accuracy in website credibility classification
Significant improvement in top-k unreliable domain detection
Strong connections between dredge words, social media, and commerce platforms
Abstract
Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines, and provide the first exploration of their usage on social media. Our graph neural networks that combine webgraph and social media contexts generate to state-of-the-art results in website credibility classification and significantly improves the top-k identification of unreliable domains. Additionally, we release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Authorship Attribution and Profiling · Text and Document Classification Technologies
MethodsGraph Neural Network
