When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks
Chung-Hsiang Lo, Lu Li, Diji Yang, Tianyu Zhang, Yunkai Zhang, Yoshua Bengio, Yi Zhang

TL;DR
This paper investigates how linearizing 2D structured tasks into 1D sequences affects model performance, finding that preserving 2D layout improves results in synthetic tasks like matrix transpose and Game of Life.
Contribution
It introduces the concept of serialization friction and demonstrates that vision-augmented pathways outperform text-only pathways on 2D structured tasks.
Findings
Visual pathways outperform text-only pathways across tasks.
Error patterns become more spatially structured with serialization.
Performance gap widens at larger dimensions.
Abstract
Large language models (LLMs) conventionally process structured inputs as 1D token sequences. While natural for prose, such linearization may introduce additional representational burden for tasks whose computation depends directly on explicit 2D structure, because row--column alignment and local neighborhoods are no longer directly expressed in the input. We study this setting, which we refer to as serialization friction, on a small diagnostic testbed of synthetic tasks with explicit 2D structure: matrix transpose, Conway's Game of Life, and LU decomposition. To examine this question, we compare a text-only language pathway over serialized inputs with a vision-augmented pathway, built on the same language backbone, that receives the same underlying content rendered in task-faithful 2D layout, yielding a system-level comparison between two end-to-end input pathways. Across the tasks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
