Chain-of-Sketch: Enabling Global Visual Reasoning

Aryo Lotfi; Enrico Fini; Samy Bengio; Moin Nabi; Emmanuel Abbe

arXiv:2410.08165·cs.LG·June 27, 2025

Chain-of-Sketch: Enabling Global Visual Reasoning

Aryo Lotfi, Enrico Fini, Samy Bengio, Moin Nabi, Emmanuel Abbe

PDF

Open Access

TL;DR

This paper introduces Chain-of-Sketch, a method that improves global visual reasoning in large vision models by breaking complex tasks into intermediate steps with a Markovian structure, enhancing generalization and efficiency.

Contribution

We propose the chain-of-sketch technique with a Markovian structure, enabling better learning and generalization on global reasoning tasks in vision models.

Findings

01

Large vision models struggle with global reasoning tasks.

02

Chain-of-sketch improves learning efficiency and generalization.

03

Markovian structure in CoS enhances out-of-distribution performance.

Abstract

Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in tackling tasks requiring more global reasoning, where local features do not provide significant information. Minsky and Papert put forward such tasks in 1969 with their connectivity study, exposing the limitations of the perceptron model. In this paper, we introduce an expanded set of global visual datasets involving graphs, strings, mazes, and image grids. We show that large vision models still struggle to learn these tasks efficiently. Similarly, state-of-the-art multi-modal LLMs perform poorly on these datasets. We explain this learning inefficiency by means of the 'globality degree' measure. To mitigate this, we propose a method called chain-of-sketch (CoS). Similar to the chain-of-thought and scratchpad techniques…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Tools and Methods · Online and Blended Learning