Teaching Sarcasm: Few-Shot Multimodal Sarcasm Detection via Distillation to a Parameter-Efficient Student
Soumyadeep Jana, Sanasam Ranbir Singh

TL;DR
This paper introduces PEKD, a framework that improves few-shot multimodal sarcasm detection by distilling knowledge from a large expert model into parameter-efficient fine-tuning methods, enhancing performance in low-resource settings.
Contribution
The paper proposes PEKD, a novel distillation framework with an entropy-aware gating mechanism to boost parameter-efficient fine-tuning for multimodal sarcasm detection in few-shot scenarios.
Findings
PEKD outperforms prior parameter-efficient methods.
PEKD surpasses large multimodal models in few-shot tasks.
The framework is modular and adaptable across models.
Abstract
Multimodal sarcasm detection is challenging, especially in low-resource settings where subtle image-text contradictions are hard to learn due to scarce annotated data, which hinders the model's performance. Parameter-efficient fine-tuning (PEFT) methods like adapters, LoRA, and prompt tuning reduce overfitting but struggle to reach optimal performance due to limited supervision from few-shot data. We propose PEKD, a unified framework that enhances PEFT methods via distillation from an expert model trained on large-scale sarcasm data, which acts as the teacher. To mitigate unreliable signals from the teacher, we introduce an entropy-aware gating mechanism that dynamically adjusts the distillation strength based on teacher confidence. Experiments on two public datasets demonstrate that our PEKD framework enables PEFT methods to outperform both prior parameter-efficient approaches and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
