MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

Diandian Guo; Fangfang Yuan; Cong Cao; Xixun Lin; Chuan Zhou; Hao Peng; Yanan Cao; Yanbing Liu

arXiv:2601.20451·cs.CL·March 31, 2026

MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu

PDF

TL;DR

MuVaC is a novel variational causal inference framework that jointly models multimodal sarcasm detection and explanation, mimicking human reasoning to improve understanding of sarcasm in dialogues.

Contribution

It introduces a causal modeling approach that integrates detection and explanation tasks, addressing their dependency and enhancing multimodal sarcasm understanding.

Findings

01

MuVaC outperforms existing methods on public datasets.

02

The framework effectively models causal relationships for sarcasm analysis.

03

Joint optimization improves both detection accuracy and explanation quality.

Abstract

The prevalence of sarcasm in multimodal dialogues on the social platforms presents a crucial yet challenging task for understanding the true intent behind online content. Comprehensive sarcasm analysis requires two key aspects: Multimodal Sarcasm Detection (MSD) and Multimodal Sarcasm Explanation (MuSE). Intuitively, the act of detection is the result of the reasoning process that explains the sarcasm. Current research predominantly focuses on addressing either MSD or MuSE as a single task. Even though some recent work has attempted to integrate these tasks, their inherent causal dependency is often overlooked. To bridge this gap, we propose MuVaC, a variational causal inference framework that mimics human cognitive mechanisms for understanding sarcasm, enabling robust multimodal feature learning to jointly optimize MSD and MuSE. Specifically, we first model MSD and MuSE from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.