Deep Networks Favor Simple Data
Weyl Lu, Chenjie Hao, Yubei Chen

TL;DR
Deep neural networks consistently assign higher density to simpler data samples, revealing a preference for simplicity that persists across architectures, training regimes, and out-of-distribution scenarios.
Contribution
The paper introduces new density estimators applicable to various models and demonstrates that deep networks favor simple data, a phenomenon observed across multiple architectures and datasets.
Findings
Lower-complexity samples receive higher estimated density.
The preference for simplicity is consistent across models and datasets.
Models trained only on complex samples still rank simpler images higher in density.
Abstract
Estimated density is often interpreted as indicating how typical a sample is under a model. Yet deep models trained on one dataset can assign higher density to simpler out-of-distribution (OOD) data than to in-distribution test data. We refer to this behavior as the OOD anomaly. Prior work typically studies this phenomenon within a single architecture, detector, or benchmark, implicitly assuming certain canonical densities. We instead separate the trained network from the density estimator built from its representations or outputs. We introduce two estimators: Jacobian-based estimators and autoregressive self-estimators, making density analysis applicable to a wide range of models. Applying this perspective to a range of models, including iGPT, PixelCNN++, Glow, score-based diffusion models, DINOv2, and I-JEPA, we find the same striking regularity that goes beyond the OOD anomaly:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
