Beyond Task Performance: Evaluating and Reducing the Flaws of Large   Multimodal Models with In-Context Learning

Mustafa Shukor; Alexandre Rame; Corentin Dancette; Matthieu Cord

arXiv:2310.00647·cs.CV·January 23, 2024·1 cites

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the limitations of large multimodal models across various axes, investigates the impact of in-context learning (ICL) on these flaws, and proposes new ICL variants to improve model alignment without additional training.

Contribution

It provides a comprehensive evaluation of 10 open-source LMMs on multiple axes and introduces novel multimodal ICL variants to address their flaws without retraining.

Findings

01

LMMs have significant flaws that are not fixed by scaling alone.

02

ICL improves explainability and abstention but has limited effect on other flaws.

03

Proposed ICL variants show promise as post-hoc solutions to mitigate model flaws.

Abstract

Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs reveals major limitations that are hardly captured by the current evaluation benchmarks. Indeed, task performances (e.g., VQA accuracy) alone do not provide enough clues to understand their real capabilities, limitations, and to which extent such models are aligned to human expectations. To refine our understanding of those flaws, we deviate from the current evaluation paradigm, and (1) evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following. Our evaluation on these axes reveals major flaws in LMMs. While the current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mshukor/EvALign-ICL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsALIGN