# Less is More: Semi-Supervised Causal Inference for Detecting Pathogenic   Users in Social Media

**Authors:** Hamidreza Alvari, Elham Shaabani, Soumajyoti Sarkar, Ghazaleh Beigi,, Paulo Shakarian

arXiv: 1903.01693 · 2019-03-06

## TL;DR

This paper introduces SemiPsm, a semi-supervised causal inference framework that effectively detects Pathogenic Social Media accounts using minimal labeled data and manifold regularization, demonstrated on real-world Twitter data.

## Contribution

The paper presents a novel semi-supervised approach for PSM detection that relies solely on cascade information and unlabeled data, reducing the need for extensive feature engineering.

## Key findings

- SemiPsm achieves promising detection accuracy on ISIS-related Twitter data.
- Utilizing unlabeled data improves PSM detection performance.
- The method reduces reliance on exhaustive feature engineering.

## Abstract

Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as "Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.01693/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.01693/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1903.01693/full.md

---
Source: https://tomesphere.com/paper/1903.01693