Unveiling the Training Dynamics of ReLU Networks through a Linear Lens
Longqing Ye

TL;DR
This paper introduces a linear analytical framework for understanding ReLU neural networks by examining input-dependent effective weights, revealing how training shapes class-specific representations and decision boundaries.
Contribution
It proposes a novel method to analyze ReLU networks as input-dependent linear models, providing insights into training dynamics and representation learning.
Findings
Effective weights converge for same-class samples during training
Effective weights diverge for different-class samples
Provides a new perspective on class boundary formation
Abstract
Deep neural networks, particularly those employing Rectified Linear Units (ReLU), are often perceived as complex, high-dimensional, non-linear systems. This complexity poses a significant challenge to understanding their internal learning mechanisms. In this work, we propose a novel analytical framework that recasts a multi-layer ReLU network into an equivalent single-layer linear model with input-dependent "effective weights". For any given input sample, the activation pattern of ReLU units creates a unique computational path, effectively zeroing out a subset of weights in the network. By composing the active weights across all layers, we can derive an effective weight matrix, , that maps the input directly to the output for that specific sample. We posit that the evolution of these effective weights reveals fundamental principles of representation learning. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
