Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision
Tianqin Li, George Liu, Tai Sing Lee

TL;DR
This paper introduces a structure-first learning paradigm using line drawings to develop more efficient, generalizable, and human-aligned visual models that require less data and are more robust.
Contribution
It proposes a novel training approach that emphasizes structural representations, leading to models with stronger shape bias, lower dimensionality, and better transferability compared to traditional methods.
Findings
Models trained with line drawings have a stronger shape bias.
They exhibit lower intrinsic dimensionality and require fewer principal components.
Distilled student models outperform those from color-supervised teachers.
Abstract
Despite remarkable progress in computer vision, modern recognition systems remain fundamentally limited by their dependence on rich, redundant visual inputs. In contrast, humans can effortlessly understand sparse, minimal representations like line drawings, suggesting that structure, rather than appearance, underlies efficient visual understanding. In this work, we propose a novel structure-first learning paradigm that uses line drawings as an initial training modality to induce more compact and generalizable visual representations. We demonstrate that models trained with this approach develop a stronger shape bias, more focused attention, and greater data efficiency across classification, detection, and segmentation tasks. Notably, these models also exhibit lower intrinsic dimensionality, requiring significantly fewer principal components to capture representational variance, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Face Recognition and Perception · Domain Adaptation and Few-Shot Learning
