Dissecting Dissonance: Benchmarking Large Multimodal Models Against   Self-Contradictory Instructions

Jin Gao; Lei Gan; Yuankai Li; Yixin Ye; Dequan Wang

arXiv:2408.01091·cs.AI·August 6, 2024

Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a benchmark for evaluating large multimodal models' ability to detect self-contradictory instructions across language and vision, revealing current models' limitations and proposing a prompting method to improve dissonance recognition.

Contribution

The paper presents a new benchmark dataset for self-contradiction detection in multimodal models and a novel prompting technique to enhance their self-awareness.

Findings

01

Current LMMs struggle with instruction discordance detection.

02

The Self-Contradictory Instructions benchmark contains 20,000 conflicts.

03

Cognitive Awakening Prompting improves dissonance detection performance.

Abstract

Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shiyegao/Self-Contradictory-Instructions-SCI
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques