IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic   Representations

Deqing Fu; Ruohao Guo; Ghazal Khalighinejad; Ollie Liu; Bhuwan; Dhingra; Dani Yogatama; Robin Jia; Willie Neiswanger

arXiv:2404.01266·cs.AI·August 20, 2024·2 cites

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie Liu, Bhuwan, Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger

PDF

Open Access 1 Datasets

TL;DR

IsoBench is a benchmark dataset that evaluates foundation models across different input modalities using isomorphic representations, revealing modality preferences and proposing techniques to improve multimodal performance.

Contribution

The paper introduces IsoBench, a novel benchmark with isomorphic input representations, and proposes prompting methods to enhance multimodal model capabilities.

Findings

01

Models prefer textual over visual representations.

02

Claude-3 Opus performs 28.7 points worse with images.

03

Proposed techniques improve performance by leveraging multiple representations.

Abstract

Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs. But do their capabilities change depending on the input modality? In this work, we propose $IsoBench$ , a benchmark dataset containing problems from four major areas: math, science, algorithms, and games. Each example is presented with multiple $isomorphic representations$ of inputs, such as visual, textual, and mathematical presentations. IsoBench provides fine-grained feedback to diagnose performance gaps caused by the form of the representation. Across various foundation models, we observe that on the same problem, models have a consistent preference towards textual representations. Most prominently, when evaluated on all IsoBench problems, Claude-3 Opus performs 28.7 points worse when provided with images instead of text; similarly, GPT-4…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

isobench/IsoBench
dataset· 126 dl
126 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Semantic Web and Ontologies

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing