Chain-of-Description: What I can understand, I can put into words

Jiaxin Guo; Daimeng Wei; Zongyao Li; Hengchao Shang; Yuanchang Luo,; Hao Yang

arXiv:2502.16137·cs.CL·February 25, 2025

Chain-of-Description: What I can understand, I can put into words

Jiaxin Guo, Daimeng Wei, Zongyao Li, Hengchao Shang, Yuanchang Luo,, Hao Yang

PDF

Open Access

TL;DR

This paper introduces Chain-of-Description Prompting, a new method for Multi-Modal Large Language Models that improves performance by encouraging detailed input descriptions before answering, validated on audio and vision benchmarks.

Contribution

The paper proposes Chain-of-Description Prompting, a novel strategy that enhances multi-modal model performance by structured input description, demonstrating significant improvements over standard prompts.

Findings

01

Nearly 4% improvement on AIR-Bench-Chat audio benchmark

02

5.3% improvement on MMMU_Pro vision benchmark

03

Ablation study confirms effectiveness of CoD Prompting

Abstract

In this paper, we propose a novel strategy defined as Chain-of-Description (CoD) Prompting, tailored for Multi-Modal Large Language Models. This approach involves having the model first provide a detailed description of the multi-modal input before generating an answer to the question. When applied to models such as Qwen2-Audio, Qwen2-VL, and Qwen2.5-VL, CoD Prompting significantly enhances performance compared to standard prompting methods. This is demonstrated by nearly a 4\% improvement in the speech category of the audio benchmark AIR-Bench-Chat and a 5.3\% improvement in the hard-level portion of the vision benchmark MMMU\_Pro. Our ablation study further validates the effectiveness of CoD Prompting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Humanities and Scholarship · Semantic Web and Ontologies