CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target   Identification with Large Multimodal Models

Hongzhan Lin; Zixin Chen; Ziyang Luo; Mingfei Cheng; Jing Ma; Guang; Chen

arXiv:2405.00390·cs.CL·May 21, 2024

CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang, Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces CofiPara, a coarse-to-fine framework leveraging large multimodal models for more accurate and explainable multimodal sarcasm target identification, addressing limitations of existing superficial methods.

Contribution

It proposes a novel coarse-to-fine paradigm that combines large multimodal reasoning with fine-tuning for improved sarcasm target detection and explainability.

Findings

01

Outperforms state-of-the-art MSTI methods

02

Enhances explainability in sarcasm detection

03

Effectively handles noise in multimodal data

Abstract

Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lbotirx/cofipara
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Advanced Image and Video Retrieval Techniques

MethodsFocus