Adapter-state Sharing CLIP for Parameter-efficient Multimodal Sarcasm Detection
Soumyadeep Jana, Sahil Danayak, Sanasam Ranbir Singh

TL;DR
This paper introduces AdS-CLIP, a parameter-efficient multimodal sarcasm detection framework that leverages adapter-state sharing in CLIP, outperforming existing methods with fewer parameters.
Contribution
The paper proposes a novel adapter-state sharing mechanism in CLIP for sarcasm detection, enhancing cross-modal learning while reducing parameter count.
Findings
Outperforms standard PEFT methods on sarcasm detection benchmarks.
Uses significantly fewer trainable parameters than existing baselines.
Effectively preserves unimodal representations in lower layers.
Abstract
The growing prevalence of multimodal image-text sarcasm on social media poses challenges for opinion mining systems. Existing approaches rely on full fine-tuning of large models, making them unsuitable to adapt under resource-constrained settings. While recent parameter-efficient fine-tuning (PEFT) methods offer promise, their off-the-shelf use underperforms on complex tasks like sarcasm detection. We propose AdS-CLIP (Adapter-state Sharing in CLIP), a lightweight framework built on CLIP that inserts adapters only in the upper layers to preserve low-level unimodal representations in the lower layers and introduces a novel adapter-state sharing mechanism, where textual adapters guide visual ones to promote efficient cross-modal learning in the upper layers. Experiments on two public benchmarks demonstrate that AdS-CLIP not only outperforms standard PEFT methods but also existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Law, AI, and Intellectual Property · AI in cancer detection
