Reasoning Beyond Literal: Cross-style Multimodal Reasoning for Figurative Language Understanding
Seyyed Saeid Cheshmi, Hahnemann Ortiz, James Mooney, Dongyeop Kang

TL;DR
This paper introduces a framework for multimodal reasoning models that interpret figurative language, provide transparent reasoning, and generalize across styles, improving understanding of sarcasm, humor, and metaphors in vision-language tasks.
Contribution
It presents a three-step framework enabling models to interpret, reason about, and generalize across multiple figurative styles in multimodal settings, with improved performance and transparency.
Findings
Reasoning traces significantly enhance figurative understanding.
Knowledge transfers effectively between related styles like sarcasm and humor.
Joint training across styles yields a generalized model outperforming larger counterparts.
Abstract
Vision-language models (VLMs) have demonstrated strong reasoning abilities in literal multimodal tasks such as visual mathematics and science question answering. However, figurative language, such as sarcasm, humor, and metaphor, remains a significant challenge, as it conveys intent and emotion through subtle incongruities between expressed and intended meanings. In multimodal settings, accompanying images can amplify or invert textual meaning, demanding models that reason across modalities and account for subjectivity. We propose a three-step framework for developing efficient multimodal reasoning models that can (i) interpret multimodal figurative language, (ii) provide transparent reasoning traces, and (iii) generalize across multiple figurative styles. Experiments across four styles show that (1) incorporating reasoning traces substantially improves multimodal figurative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Sentiment Analysis and Opinion Mining
