Perceptual Inductive Bias Is What You Need Before Contrastive Learning

Tianqin Li; Junru Zhao; Dunhan Jiang; Shenghao Wu; Alan Ramirez; Tai Sing Lee

arXiv:2506.01201·cs.CV·April 14, 2026

Perceptual Inductive Bias Is What You Need Before Contrastive Learning

Tianqin Li, Junru Zhao, Dunhan Jiang, Shenghao Wu, Alan Ramirez, Tai Sing Lee

PDF

1 Video

TL;DR

This paper proposes a multi-stage pretraining approach inspired by Marr's theory, which improves convergence speed, representation quality, and robustness in contrastive learning by incorporating perceptual inductive biases from early visual processing.

Contribution

It introduces a novel pretraining stage based on human visual perception to enhance contrastive learning, leading to faster convergence and better downstream performance.

Findings

01

Achieves 2x faster convergence on ResNet18.

02

Improves semantic segmentation, depth estimation, and object recognition results.

03

Enhances robustness and out-of-distribution generalization.

Abstract

David Marr's seminal theory of human perception stipulates that visual processing is a multi-stage process, prioritizing the derivation of boundary and surface properties before forming semantic object representations. In contrast, contrastive representation learning frameworks typically bypass this explicit multi-stage approach, defining their objective as the direct learning of a semantic representation space for objects. While effective in general contexts, this approach sacrifices the inductive biases of vision, leading to slower convergence speed and learning shortcut resulting in texture bias. In this work, we demonstrate that leveraging Marr's multi-stage theory-by first constructing boundary and surface-level representations using perceptual constructs from early visual processing stages and subsequently training for object semantics-leads to 2x faster convergence on ResNet18,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Perceptual Inductive Bias Is What You Need Before Contrastive Learning· slideslive