ColorFoil: Investigating Color Blindness in Large Vision and Language   Models

Ahnaf Mozib Samin; M. Firoz Ahmed; Md. Mushtaq Shahriyar Rafee

arXiv:2405.11685·cs.CV·January 7, 2025

ColorFoil: Investigating Color Blindness in Large Vision and Language Models

Ahnaf Mozib Samin, M. Firoz Ahmed, Md. Mushtaq Shahriyar Rafee

PDF

Open Access 1 Repo

TL;DR

This paper introduces ColorFoil, a benchmark to evaluate large vision and language models' ability to perceive colors, revealing significant gaps in their robustness and color discrimination capabilities in zero-shot settings.

Contribution

The paper presents a new benchmark, ColorFoil, for assessing color perception in V&L models and evaluates seven models, highlighting their strengths and weaknesses in color recognition.

Findings

01

ViLT and BridgeTower outperform others in color perception.

02

CLIP-based models and GroupViT struggle with distinct color differentiation.

03

Models show limited robustness in complex linguistic and visual attribute understanding.

Abstract

With the utilization of Transformer architecture, large Vision and Language (V&L) models have shown promising performance in even zero-shot settings. Several studies, however, indicate a lack of robustness of the models when dealing with complex linguistics and visual attributes. In this work, we introduce a novel V&L benchmark - ColorFoil, by creating color-related foils to assess the models' perception ability to detect colors like red, white, green, etc. We evaluate seven state-of-the-art V&L models including CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot setting and present intriguing findings from the V&L models. The experimental evaluation indicates that ViLT and BridgeTower demonstrate much better color perception capabilities compared to CLIP and its variants and GroupViT. Moreover, CLIP-based models and GroupViT struggle to distinguish colors that are visually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samin9796/colorfoil
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCategorization, perception, and language

MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout