EcoSpa: Efficient Transformer Training with Coupled Sparsity

Jinqi Xiao; Cheng Luo; Lingyi Huang; Cheng Yang; Yang Sui; Huy Phan; Xiao Zang; Yibiao Ying; Zhexiang Tang; Anima Anandkumar; Bo Yuan

arXiv:2511.11641·cs.LG·November 18, 2025

EcoSpa: Efficient Transformer Training with Coupled Sparsity

Jinqi Xiao, Cheng Luo, Lingyi Huang, Cheng Yang, Yang Sui, Huy Phan, Xiao Zang, Yibiao Ying, Zhexiang Tang, Anima Anandkumar, Bo Yuan

PDF

Open Access

TL;DR

EcoSpa is a structured sparse training method for transformers that preserves weight matrix interactions, leading to significant efficiency gains and model compression without specialized hardware.

Contribution

EcoSpa introduces a novel coupled sparsity approach that jointly evaluates and sparsifies weight matrix pairs, maintaining structural relationships for improved performance.

Findings

01

50% memory reduction in LLaMA-1B training

02

2.2× model compression on GPT-2-Medium

03

1.6× inference speedup

Abstract

Transformers have become the backbone of modern AI, yet their high computational demands pose critical system challenges. While sparse training offers efficiency gains, existing methods fail to preserve critical structural relationships between weight matrices that interact multiplicatively in attention and feed-forward layers. This oversight leads to performance degradation at high sparsity levels. We introduce EcoSpa, an efficient structured sparse training method that jointly evaluates and sparsifies coupled weight matrix pairs, preserving their interaction patterns through aligned row/column removal. EcoSpa introduces a new granularity for calibrating structural component importance and performs coupled estimation and sparsification across both pre-training and fine-tuning scenarios. Evaluations demonstrate substantial improvements: EcoSpa enables efficient training of LLaMA-1B with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Stochastic Gradient Optimization Techniques