Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd

Yejin Son; Saejin Kim; Dongjun Min; Younjae Yu

arXiv:2602.01561·cs.CV·February 3, 2026

Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd

Yejin Son, Saejin Kim, Dongjun Min, Younjae Yu

PDF

Open Access 1 Video

TL;DR

This paper introduces MUN, a benchmark for evaluating multimodal commonsense reasoning in atypical scenarios, and proposes R-ICL with MER to improve model performance without extra training.

Contribution

The paper presents a new benchmark, MUN, and a retrieval-based in-context learning framework, R-ICL, to enhance reasoning in unusual multimodal contexts without additional training.

Findings

01

R-ICL improves performance by 8.3% over baseline methods.

02

MER effectively retrieves relevant exemplars in discordant multimodal pairs.

03

MUN enables evaluation of model robustness in diverse, non-typical scenarios.

Abstract

Commonsense reasoning in multimodal contexts remains a foundational challenge in artificial intelligence. We introduce Multimodal UNcommonsense(MUN), a benchmark designed to evaluate models' ability to handle scenarios that deviate from typical visual or contextual expectations. MUN pairs visual scenes with surprising or unlikely outcomes described in natural language, prompting models to either rationalize seemingly odd images using everyday logic or uncover unexpected interpretations in ordinary scenes. To support this task, we propose a retrieval-based in-context learning (R-ICL) framework that transfers reasoning capabilities from larger models to smaller ones without additional training. Leveraging a novel Multimodal Ensemble Retriever (MER), our method identifies semantically relevant exemplars even when image and text pairs are deliberately discordant. Experiments show an average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning