Measuring Agreeableness Bias in Multimodal Models
Jaehyuk Lim, Bruce W. Lee

TL;DR
This study uncovers a consistent agreeableness bias in multimodal models, where pre-marked options in images influence responses, raising concerns about their reliability in critical applications.
Contribution
It systematically demonstrates how pre-marked options bias multimodal models, highlighting a previously underexplored reliability issue in these systems.
Findings
Models shift responses towards pre-marked options
Bias is consistent across different architectures
Pre-marked cues significantly affect model answers
Abstract
This paper examines a phenomenon in multimodal language models where pre-marked options in question images can significantly influence model responses. Our study employs a systematic methodology to investigate this effect: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings. Comprehensive evaluations demonstrate that this agreeableness bias is a consistent and quantifiable behavior across various model architectures. These results show potential limitations in the reliability of these models when processing images with pre-marked options, raising important questions about their application in critical decision-making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual perception and processing mechanisms · Color perception and design
