CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?

Aashish Anantha Ramakrishnan; Aadarsh Anantha Ramakrishnan; Dongwon Lee

arXiv:2502.11300·cs.CL·June 10, 2025

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?

Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces CORDIAL, a benchmark for evaluating multimodal large language models' ability to understand coherence relationships in discourse, revealing current models' limitations in this aspect.

Contribution

The paper presents a new benchmark, CORDIAL, for assessing MLLMs' understanding of coherence relations across multiple discourse domains, highlighting the need for discourse-aware evaluation methods.

Findings

01

Top MLLMs underperform simple classifiers in coherence tasks.

02

Current models struggle with pragmatic and intermodal relationship understanding.

03

The study advocates for discourse-driven evaluation frameworks.

Abstract

Multimodal Large Language Models (MLLMs) are renowned for their superior instruction-following and reasoning capabilities across diverse problem domains. However, existing benchmarks primarily focus on assessing factual and logical correctness in downstream tasks, with limited emphasis on evaluating MLLMs' ability to interpret pragmatic cues and intermodal relationships. To address this gap, we assess the competency of MLLMs in performing Multimodal Discourse Analysis (MDA) using Coherence Relations. Our benchmark, CORDIAL, encompasses a broad spectrum of Coherence Relations across 3 different discourse domains at varying levels of granularity. Through our experiments on 10+ MLLMs employing different prompting strategies, we show that even top models like Gemini 1.5 Pro and GPT-4o fail to match the performance of simple classifier-based baselines. This study emphasizes the need to move…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aashish2000/cordial
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsFocus · ADaptive gradient method with the OPTimal convergence rate