The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Yifan Wu; Pengchuan Zhang; Wenhan Xiong; Barlas Oguz; James C. Gee,; Yixin Nie

arXiv:2311.09193·cs.CL·November 16, 2023·1 cites

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C. Gee,, Yixin Nie

PDF

Open Access

TL;DR

This paper investigates how the Chain-of-Thought reasoning approach enhances complex vision-language tasks by introducing a 'Description then Decision' strategy that significantly improves performance.

Contribution

It introduces a novel 'Description then Decision' strategy inspired by human processing, demonstrating substantial performance improvements in vision-language reasoning tasks.

Findings

01

Probes show a 50% performance increase with the new strategy.

02

The approach effectively decomposes complex reasoning in vision-language tasks.

03

Lays groundwork for future reasoning paradigm research.

Abstract

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning. We present the "Description then Decision" strategy, which is inspired by how humans process signals. This strategy significantly improves probing task performance by 50%, establishing the groundwork for future research on reasoning paradigms in complex vision-language tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Mapping · Categorization, perception, and language