Commander-GPT: Dividing and Routing for Multimodal Sarcasm Detection
Yazhou Zhang, Chunwang Zou, Bo Wang, Jing Qin, Prayag Tiwari

TL;DR
Commander-GPT is a modular framework that orchestrates specialized LLM agents for multimodal sarcasm detection, significantly improving accuracy over existing methods.
Contribution
It introduces a decision routing framework with multiple types of commanders to coordinate specialized LLM agents for sarcasm detection.
Findings
Achieves 4.4% and 11.7% F1 score improvements over SoTA baselines.
Demonstrates effectiveness across MMSD and MMSD 2.0 benchmarks.
Utilizes various commander models, including lightweight and large LLMs.
Abstract
Multimodal sarcasm understanding is a high-order cognitive task. Although large language models (LLMs) have shown impressive performance on many downstream NLP tasks, growing evidence suggests that they struggle with sarcasm understanding. In this paper, we propose Commander-GPT, a modular decision routing framework inspired by military command theory. Rather than relying on a single LLM's capability, Commander-GPT orchestrates a team of specialized LLM agents where each agent will be selectively assigned to a focused sub-task such as keyword extraction, sentiment analysis, etc. Their outputs are then routed back to the commander, which integrates the information and performs the final sarcasm judgment. To coordinate these agents, we introduce three types of centralized commanders: (1) a trained lightweight encoder-based commander (e.g., multi-modal BERT); (2) four small autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
