XConv: Low-memory stochastic backpropagation for convolutional layers
Anirudh Thatipelli, Jeffrey Sam, Mathias Louboutin, Ali Siahkoohi, Rongrong Wang, Felix J. Herrmann

TL;DR
XConv introduces a memory-efficient convolutional layer replacement that preserves standard backpropagation, reduces memory usage by over 50%, and maintains competitive performance across various tasks by leveraging algebraic structures and randomized trace estimation.
Contribution
It presents XConv, a novel low-memory convolutional layer that integrates seamlessly with existing frameworks and offers theoretical guarantees and empirical performance comparable to exact methods.
Findings
Reduces memory usage by a factor of two or more.
Achieves performance comparable to exact gradient methods.
Maintains computational efficiency with optimized convolution implementations.
Abstract
Training convolutional neural networks at scale demands substantial memory, largely due to storing intermediate activations for backpropagation. Existing approaches -- such as checkpointing, invertible architectures, or gradient approximation methods like randomized automatic differentiation -- either incur significant computational overhead, impose architectural constraints, or require non-trivial codebase modifications. We propose XConv, a drop-in replacement for standard convolutional layers that addresses all three limitations: it preserves standard backpropagation, imposes no architectural constraints, and integrates seamlessly into existing codebases. XConv exploits the algebraic structure of convolutional layer gradients, storing highly compressed activations and approximating weight gradients via multi-channel randomized trace estimation. We establish convergence guarantees and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
