Linearly Separable Features in Shallow Nonlinear Networks: Width Scales Polynomially with Intrinsic Data Dimension

Alec S. Xu; Can Yaras; Peng Wang; Qing Qu

arXiv:2501.02364·cs.LG·March 20, 2026

Linearly Separable Features in Shallow Nonlinear Networks: Width Scales Polynomially with Intrinsic Data Dimension

Alec S. Xu, Can Yaras, Peng Wang, Qing Qu

PDF

Open Access

TL;DR

This paper proves that shallow nonlinear networks can transform low-dimensional data into linearly separable sets with high probability, with network width scaling polynomially with data's intrinsic dimension, supported by experiments.

Contribution

It provides the first theoretical analysis showing shallow nonlinear networks can achieve linear separability with width scaling polynomially with intrinsic data dimension.

Findings

01

Linear separation occurs with high probability using random weights and quadratic activations.

02

Network width scales polynomially with intrinsic data dimension, not ambient dimension.

03

Experimental results support theoretical claims beyond analytical scope.

Abstract

Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. However, these findings often lack rigorous justifications, even under relatively simple settings. In this work, we address this gap by examining the linear separation capabilities of shallow nonlinear networks. Specifically, inspired by the low intrinsic dimensionality of image data, we model inputs as a union of low-dimensional subspaces (UoS) and demonstrate that a single nonlinear layer can transform such data into linearly separable sets. Theoretically, we show that this transformation occurs with high probability when using random weights and quadratic activations. Notably, we prove this can be achieved when the network width scales polynomially with the intrinsic dimension of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications