TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training
Sebastian Loeschcke, David Pitt, Robert Joseph George, Jiawei Zhao, Cheng Luo, Yuandong Tian, Jean Kossaifi, Anima Anandkumar

TL;DR
TensorGRaD introduces a memory-efficient tensor gradient decomposition method that significantly reduces memory usage in neural operator training, enabling high-resolution PDE solutions with maintained or improved accuracy.
Contribution
The paper presents a novel robust tensor decomposition approach for gradient compression, specifically designed for tensor-parameterized neural operators, with theoretical guarantees and practical efficiency.
Findings
Reduces memory usage by over 50% in neural operator training.
Achieves comparable or improved accuracy on PDE tasks, including turbulent Navier-Stokes.
Provides a theoretical advantage over matrix-based gradient compression methods.
Abstract
Scientific problems require resolving multi-scale phenomena across different resolutions and learning solution operators in infinite-dimensional function spaces. Neural operators provide a powerful framework for this, using tensor-parameterized layers to capture complex, multi-dimensional relationships. However, scaling neural operators to high-resolution problems leads to significant computational demands, making the training of industrial-scale models prohibitive. In this work, we introduce \textbf{TensorGRaD}, a novel method that directly addresses the memory challenges associated with optimizing large tensor-structured weights. Our approach, based on a \texit{robust tensor decomposition}, factorizes gradients as the sum of a low-rank tensor and a sparse one to efficiently capture information within optimizer states, including outliers. Additionally, we provide a recipe for mixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
