TL;DR
This paper introduces MDIT-Bench, a new benchmark for evaluating large multimodal models' sensitivity to dual-implicit toxicity, revealing current models' limitations in detecting subtle, prejudice-related toxicity.
Contribution
The paper presents a novel dataset and benchmark for dual-implicit toxicity, along with a new metric and extensive evaluation of existing models' performance.
Findings
Models struggle with dual-implicit toxicity detection.
Performance drops significantly at higher difficulty levels.
Current models contain hidden toxicity that is hard to detect.
Abstract
The widespread use of Large Multimodal Models (LMMs) has raised concerns about model toxicity. However, current research mainly focuses on explicit toxicity, with less attention to some more implicit toxicity regarding prejudice and discrimination. To address this limitation, we introduce a subtler type of toxicity named dual-implicit toxicity and a novel toxicity benchmark termed MDIT-Bench: Multimodal Dual-Implicit Toxicity Benchmark. Specifically, we first create the MDIT-Dataset with dual-implicit toxicity using the proposed Multi-stage Human-in-loop In-context Generation method. Based on this dataset, we construct the MDIT-Bench, a benchmark for evaluating the sensitivity of models to dual-implicit toxicity, with 317,638 questions covering 12 categories, 23 subcategories, and 780 topics. MDIT-Bench includes three difficulty levels, and we propose a metric to measure the toxicity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
