# AI-powered detection of cyberbullying in short-form video content: A hybrid deep learning framework

**Authors:** Ahmad A. Mazhar, Islam Zada, Manal Aldhayan, Seetah Alsalamah, Mashael M. Asiri, Manel Ayadi, Abdullah Alshahrani, Muhammad Shahid Anwar, Muhammad Shahid Anwar, Muhammad Shahid Anwar

PMC · DOI: 10.1371/journal.pone.0338799 · PLOS One · 2026-02-11

## TL;DR

This paper introduces a deep learning framework to detect cyberbullying in short-form videos by combining visual, audio, and text analysis.

## Contribution

A novel hybrid deep learning model that integrates CNNs, BiLSTMs, and Transformers with cross-modal alignment for cyberbullying detection.

## Key findings

- The framework achieved 91.6% accuracy and 91.3% F1-score on benchmark datasets.
- It showed consistent performance across Instagram, TikTok, and YouTube Shorts.
- The model's interpretability and scalability support real-time content moderation.

## Abstract

The explosive rise of short-form video platforms such as Instagram Reels, TikTok, and YouTube Shorts has transformed digital expression while intensifying the spread of cyberbullying. Unlike video abuse conveys multimodal cues visual, text-based harassment, auditory, and textual that challenge conventional detection methods. This study presents a hybrid deep-learning framework that integrates Convolutional Neural Networks (CNNs) for spatial features, Bidirectional Long Short Term Memory (BiLSTM) networks for temporal acoustic patterns, and a Transformer-based textual encoder to analyze synchronized video, audio, and caption streams. A semantic-consistency validation layer enforces cross-modal alignment using attention-based similarity constraints, ensuring that incongruent cues are penalized during classification. Experiments on two benchmark datasets, CAVD and SocialVidMix, demonstrate state-of-the-art performance accuracy 91.6%, precision 89.7%, recall 93.0%, and F1-score 91.3% with consistent results across Instagram, TikTok, and YouTube Shorts. The framework’s, interpretability, robustnessand scalability indicate strong potential for real-time deployment in automated content-moderation systems.

## Full-text entities

- **Diseases:** aggression (MESH:D010554), AI (MESH:D060437)
- **Chemicals:** PONE-D-25-36601 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12893605/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12893605/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/PMC12893605/full.md

---
Source: https://tomesphere.com/paper/PMC12893605