A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases

Yunchong Liu; Xiaorui Shen; Yeyubei Zhang; Zhongyan Wang; Yexin Tian; Jianglai Dai; and Yuchen Cao

arXiv:2410.20293·cs.LG·June 24, 2025·Int. J. Data Sci. Anal.

A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases

Yunchong Liu, Xiaorui Shen, Yeyubei Zhang, Zhongyan Wang, Yexin Tian, Jianglai Dai, and Yuchen Cao

PDF

TL;DR

This systematic review analyzes 36 studies on machine learning methods for detecting deceptive activities on social media, highlighting biases, challenges, and the need for improved evaluation metrics and data preprocessing to enhance model reliability.

Contribution

The paper provides a comprehensive assessment of ML approaches for social media deception detection, identifying common biases and proposing best practices for future research.

Findings

01

Support Vector Machines and LSTM models show strong potential

02

Biases include sampling bias and inadequate handling of class imbalance

03

Evaluation often relies on accuracy, which is insufficient for imbalanced data

Abstract

Social media platforms like Twitter, Facebook, and Instagram have facilitated the spread of misinformation, necessitating automated detection systems. This systematic review evaluates 36 studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media. Using the Prediction model Risk Of Bias ASsessment Tool (PROBAST), the review identified key biases across the ML lifecycle: selection bias due to non-representative sampling, inadequate handling of class imbalance, insufficient linguistic preprocessing (e.g., negations), and inconsistent hyperparameter tuning. Although models such as Support Vector Machines (SVM), Random Forests, and Long Short-Term Memory (LSTM) networks showed strong potential, over-reliance on accuracy as an evaluation metric in imbalanced data settings was a common flaw. The review highlights the need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.