COAP: Memory-Efficient Training with Correlation-Aware Gradient   Projection

Jinqi Xiao; Shen Sang; Tiancheng Zhi; Jing Liu; Qing Yan; Yuqian; Zhang; Linjie Luo; Bo Yuan

arXiv:2412.00071·cs.LG·March 13, 2025

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Yuqian, Zhang, Linjie Luo, Bo Yuan

PDF

Open Access

TL;DR

COAP is a novel gradient projection method that significantly reduces memory usage during training of large neural networks by considering inter-projection correlations, outperforming existing low-rank methods in efficiency and performance.

Contribution

Introduces COAP, a correlation-aware gradient projection technique that minimizes computational overhead and enhances training efficiency for large-scale neural networks.

Findings

01

Reduces optimizer memory by 61% on LLaMA-1B with minimal time increase.

02

Achieves 81% memory reduction and 4x speedup with higher accuracy on LLaVA-v1.5-7B.

03

Maintains training performance comparable to AdamW across various tasks.

Abstract

Training large-scale neural networks in vision, and multimodal domains demands substantial memory resources, primarily due to the storage of optimizer states. While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates. Low-rank gradient projection methods (e.g., GaLore, Flora) reduce optimizer memory by projecting gradients and moment estimates into low-rank spaces via singular value decomposition or random projection. However, they fail to account for inter-projection correlation, causing performance degradation, and their projection strategies often incur high computational costs. In this paper, we present COAP (Correlation-Aware Gradient Projection), a memory-efficient method that minimizes computational overhead while maintaining training performance. Evaluated across various vision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsAdamW · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings