PageRank optimization applied to spam detection
Olivier Fercoq

TL;DR
This paper introduces MaxRank, a novel PageRank-based algorithm that uses a control-theoretic approach to improve spam detection by penalizing spam pages and hyperlink removals, outperforming TrustRank and AntiTrustRank.
Contribution
The paper presents MaxRank, a new scalable algorithm that enhances spam detection by optimizing hyperlink removal policies within a PageRank framework.
Findings
MaxRank outperforms TrustRank and AntiTrustRank in spam detection accuracy.
The algorithm effectively measures page 'spamicity' through a bias vector.
Experimental results on WEBSPAM-UK2007 demonstrate its scalability and effectiveness.
Abstract
We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Web Data Mining and Analysis · Text and Document Classification Technologies
