How Modality Shapes Perception and Reasoning: A Study of Error Propagation in ARC-AGI
Bo Wen, Chen Wang, Erhan Bilal

TL;DR
This paper investigates how different modalities like text and images influence perception and reasoning in AI systems, revealing that combining modalities enhances accuracy and reliability by aligning representations with model biases.
Contribution
It provides a principled analysis of how modality shapes perception and demonstrates that multimodal alignment improves instruction accuracy and execution reliability.
Findings
Structured text yields precise coordinate perception.
Images capture 2D shapes but are resolution-sensitive.
Combining text and image improves execution accuracy by 8 points.
Abstract
ARC-AGI and ARC-AGI-2 measure generalization-through-composition on small color-quantized grids, and their prize competitions make progress on these harder held-out tasks a meaningful proxy for systematic generalization. Recent instruction-first systems translate grids into concise natural-language or DSL rules executed in generate-execute-select loops, yet we lack a principled account of how encodings shape model perception and how to separate instruction errors from execution errors. We hypothesize that modality imposes perceptual bottlenecks -- text flattens 2D structure into 1D tokens while images preserve layout but can introduce patch-size aliasing -- thereby shaping which grid features are reliably perceived. To test this, we isolate perception from reasoning across nine text and image modalities using a weighted set-disagreement metric and a two-stage reasoning pipeline, finding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques
