Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Karthik Reddy Kanjula; Surya Guthikonda; Nahid Alam; Shayekh Bin Islam

arXiv:2505.06356·cs.CV·May 13, 2025

Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Bin Islam

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper analyzes toxicity in the LLaVA image-text dataset, identifies harmful content, and proposes mitigation strategies, resulting in a refined dataset with reduced toxic pairs to promote responsible multimodal AI development.

Contribution

It provides a detailed analysis of toxicity in a large pretraining dataset and introduces a mitigation process that removes toxic content, creating a safer dataset for multimodal models.

Findings

01

Identified and categorized common toxicity types in LLaVA dataset.

02

Removed 7,531 toxic image-text pairs from the dataset.

03

Provided guidelines for toxicity detection and mitigation pipelines.

Abstract

Pretraining datasets are foundational to the development of multimodal models, yet they often have inherent biases and toxic content from the web-scale corpora they are sourced from. In this paper, we investigate the prevalence of toxicity in LLaVA image-text pretraining dataset, examining how harmful content manifests in different modalities. We present a comprehensive analysis of common toxicity categories and propose targeted mitigation strategies, resulting in the creation of a refined toxicity-mitigated dataset. This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset. We offer guidelines for implementing robust toxicity detection pipelines. Our findings underscore the need to actively identify and filter toxic content - such as hate speech, explicit imagery, and targeted harassment - to build more responsible and equitable multimodal systems. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nahidalam/maya
pytorch

Datasets

maya-multimodal/pretrain
dataset· 14 dl
14 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Topic Modeling