# NetSpam: a Network-based Spam Detection Framework for Reviews in Online   Social Media

**Authors:** Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh, Noel Crespi

arXiv: 1703.03609 · 2017-03-13

## TL;DR

NetSpam is a novel network-based framework that models review datasets as heterogeneous information networks, effectively detecting spam reviews by leveraging feature importance, and outperforming existing methods on real-world datasets.

## Contribution

Introduces NetSpam, a new framework that models review data as heterogeneous networks and emphasizes feature importance for improved spam detection accuracy.

## Key findings

- NetSpam outperforms existing spam detection methods.
- Review-behavioral features are more effective than other feature categories.
- The framework achieves better results on Yelp and Amazon datasets.

## Abstract

Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, reviewlinguistic, user-linguistic, the first type of features performs better than the other categories.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.03609/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1703.03609/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1703.03609/full.md

---
Source: https://tomesphere.com/paper/1703.03609