In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale
Yixuan Li, Oscar Martinez, Xing Chen, Yi Li, John Hopcroft

TL;DR
This paper introduces Leas, a scalable semi-supervised clustering method that detects fake social engagement on YouTube by analyzing temporal user-video interaction graphs, achieving high accuracy and speed.
Contribution
Leas is a novel, scalable graph diffusion approach utilizing local spectral clustering to identify fake engagement patterns efficiently at large scale.
Findings
Achieved 98% manual review accuracy on YouTube comments graph
Leas runs 10 times faster than CopyCatch
Successfully deployed at Google for real-time fake engagement detection
Abstract
How can web services that depend on user generated content discern fake social engagement activities by spammers from legitimate ones? In this paper, we focus on the social site of YouTube and the problem of identifying bad actors posting inorganic contents and inflating the count of social engagement metrics. We propose an effective method, Leas (Local Expansion at Scale), and show how the fake engagement activities on YouTube can be tracked over time by analyzing the temporal graph based on the engagement behavior pattern between users and YouTube videos. With the domain knowledge of spammer seeds, we formulate and tackle the problem in a semi-supervised manner --- with the objective of searching for individuals that have similar pattern of behavior as the known seeds --- based on a graph diffusion process via local spectral subspace. We offer a fast, scalable MapReduce deployment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · Complex Network Analysis Techniques
