OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Jingtao Cao; Zheng Zhang; Hongru Wang; Bin Liang; Hao Wang; Kam-Fai; Wong

arXiv:2406.09779·cs.AI·June 17, 2024

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai, Wong

PDF

TL;DR

This paper introduces a comprehensive system combining image captioning, OCR, and large language models to detect harmful memes across multiple languages, achieving state-of-the-art performance in a Singaporean context.

Contribution

The study presents a novel multi-modal framework integrating various AI models and fine-tuning with GPT-4V data for effective harmful meme detection in multilingual settings.

Findings

01

Achieved top-1 at AI Singapore's Online Safety Prize Challenge

02

Outperformed previous benchmarks like FLAVA and VisualBERT

03

System effectively detects harmful content in four languages

Abstract

Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Language Model (LLM) analysis to comprehensively understand and classify harmful memes. Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil. To enhance the system's performance, we fine-tuned our approach by leveraging additional data labeled using GPT-4V, aiming to distill the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Dense Connections · Residual Connection · Softmax · Layer Normalization · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · VisualBERT · BLIP: Bootstrapping Language-Image Pre-training