COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Yuqian, Zhang, Linjie Luo, Bo Yuan

TL;DR
COAP is a novel gradient projection method that significantly reduces memory usage during training of large neural networks by considering inter-projection correlations, outperforming existing low-rank methods in efficiency and performance.
Contribution
Introduces COAP, a correlation-aware gradient projection technique that minimizes computational overhead and enhances training efficiency for large-scale neural networks.
Findings
Reduces optimizer memory by 61% on LLaMA-1B with minimal time increase.
Achieves 81% memory reduction and 4x speedup with higher accuracy on LLaVA-v1.5-7B.
Maintains training performance comparable to AdamW across various tasks.
Abstract
Training large-scale neural networks in vision, and multimodal domains demands substantial memory resources, primarily due to the storage of optimizer states. While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates. Low-rank gradient projection methods (e.g., GaLore, Flora) reduce optimizer memory by projecting gradients and moment estimates into low-rank spaces via singular value decomposition or random projection. However, they fail to account for inter-projection correlation, causing performance degradation, and their projection strategies often incur high computational costs. In this paper, we present COAP (Correlation-Aware Gradient Projection), a memory-efficient method that minimizes computational overhead while maintaining training performance. Evaluated across various vision,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsAdamW · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
