Web Spam Detection Using Multiple Kernels in Twin Support Vector Machine

Seyed Hamid Reza Mohammadi; Mohammad Ali Zare Chahooki

arXiv:1605.02917·cs.IR·May 11, 2016·2 cites

Web Spam Detection Using Multiple Kernels in Twin Support Vector Machine

Seyed Hamid Reza Mohammadi, Mohammad Ali Zare Chahooki

PDF

Open Access

TL;DR

This paper enhances web spam detection accuracy by integrating multiple nonlinear kernels into Twin Support Vector Machine, demonstrating improved performance on standard datasets.

Contribution

Introduces a novel kernelized Twin SVM approach with dual kernels for each class, improving web spam detection accuracy over traditional methods.

Findings

01

Effective in identifying spam pages with high accuracy

02

Outperforms existing SVM-based spam detection methods

03

Validated on UK-2007 and UK-2006 datasets

Abstract

Search engines are the most important tools for web data acquisition. Web pages are crawled and indexed by search Engines. Users typically locate useful web pages by querying a search engine. One of the challenges in search engines administration is spam pages which waste search engine resources. These pages by deception of search engine ranking algorithms try to be showed in the first page of results. There are many approaches to web spam pages detection such as measurement of HTML code style similarity, pages linguistic pattern analysis and machine learning algorithm on page content features. One of the famous algorithms has been used in machine learning approach is Support Vector Machine (SVM) classifier. Recently basic structure of SVM has been changed by new extensions to increase robustness and classification accuracy. In this paper we improved accuracy of web spam detection by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Web Data Mining and Analysis

MethodsSupport Vector Machine