Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord

TL;DR
This paper evaluates the limitations of large multimodal models across various axes, investigates the impact of in-context learning (ICL) on these flaws, and proposes new ICL variants to improve model alignment without additional training.
Contribution
It provides a comprehensive evaluation of 10 open-source LMMs on multiple axes and introduces novel multimodal ICL variants to address their flaws without retraining.
Findings
LMMs have significant flaws that are not fixed by scaling alone.
ICL improves explainability and abstention but has limited effect on other flaws.
Proposed ICL variants show promise as post-hoc solutions to mitigate model flaws.
Abstract
Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs reveals major limitations that are hardly captured by the current evaluation benchmarks. Indeed, task performances (e.g., VQA accuracy) alone do not provide enough clues to understand their real capabilities, limitations, and to which extent such models are aligned to human expectations. To refine our understanding of those flaws, we deviate from the current evaluation paradigm, and (1) evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following. Our evaluation on these axes reveals major flaws in LMMs. While the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsALIGN
