XConv: Low-memory stochastic backpropagation for convolutional layers

Anirudh Thatipelli; Jeffrey Sam; Mathias Louboutin; Ali Siahkoohi; Rongrong Wang; Felix J. Herrmann

arXiv:2106.06998·cs.LG·March 11, 2026

XConv: Low-memory stochastic backpropagation for convolutional layers

Anirudh Thatipelli, Jeffrey Sam, Mathias Louboutin, Ali Siahkoohi, Rongrong Wang, Felix J. Herrmann

PDF

Open Access 1 Repo

TL;DR

XConv introduces a memory-efficient convolutional layer replacement that preserves standard backpropagation, reduces memory usage by over 50%, and maintains competitive performance across various tasks by leveraging algebraic structures and randomized trace estimation.

Contribution

It presents XConv, a novel low-memory convolutional layer that integrates seamlessly with existing frameworks and offers theoretical guarantees and empirical performance comparable to exact methods.

Findings

01

Reduces memory usage by a factor of two or more.

02

Achieves performance comparable to exact gradient methods.

03

Maintains computational efficiency with optimized convolution implementations.

Abstract

Training convolutional neural networks at scale demands substantial memory, largely due to storing intermediate activations for backpropagation. Existing approaches -- such as checkpointing, invertible architectures, or gradient approximation methods like randomized automatic differentiation -- either incur significant computational overhead, impose architectural constraints, or require non-trivial codebase modifications. We propose XConv, a drop-in replacement for standard convolutional layers that addresses all three limitations: it preserves standard backpropagation, imposes no architectural constraints, and integrates seamlessly into existing codebases. XConv exploits the algebraic structure of convolutional layer gradients, storing highly compressed activations and approximating weight gradients via multi-channel randomized trace estimation. We establish convergence guarantees and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slimgroup/XConv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques