Multi-modal Hate Speech Detection using Machine Learning

Fariha Tahosin Boishakhi; Ponkoj Chandra Shill; Md. Golam Rabiul Alam

arXiv:2307.11519·cs.AI·July 24, 2023

Multi-modal Hate Speech Detection using Machine Learning

Fariha Tahosin Boishakhi, Ponkoj Chandra Shill, Md. Golam Rabiul Alam

PDF

TL;DR

This paper proposes a multimodal machine learning approach to detect hate speech in videos by combining features from images, audio, and text, addressing limitations of single-modality models.

Contribution

It introduces a novel multimodal system that integrates visual, audio, and textual features for more accurate hate speech detection in videos.

Findings

01

Improved detection accuracy over single-modality models

02

Effective feature extraction from images, audio, and text

03

Demonstrated feasibility of multimodal hate speech detection

Abstract

With the continuous growth of internet users and media content, it is very hard to track down hateful speech in audio and video. Converting video or audio into text does not detect hate speech accurately as human sometimes uses hateful words as humorous or pleasant in sense and also uses different voice tones or show different action in the video. The state-ofthe-art hate speech detection models were mostly developed on a single modality. In this research, a combined approach of multimodal system has been proposed to detect hate speech from video contents by extracting feature images, feature values extracted from the audio, text and used machine learning and Natural language processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.