TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
Michael Menezes, Barbara Su, Xinze Feng, Yehya Farhat, Hamza Shili, Anastasios Kyrillidis

TL;DR
TwIST is a distributed training framework that efficiently sparsifies large language models by training multiple subnetworks in parallel, enabling zero-cost pruning with competitive perplexity and practical inference speedups.
Contribution
TwIST introduces a novel parallel subnetwork training method that finds high-quality sparse models during training without post-processing, improving efficiency and deployment practicality.
Findings
Achieves perplexity of 23.14 with 50% sparsity, outperforming prior methods.
Produces structured, dense matrices suitable for hardware acceleration.
Enables zero-cost pruning with no additional fine-tuning or recovery steps.
Abstract
We introduce TwIST, a distributed training framework for efficient large language model (LLM) sparsification. TwIST trains multiple subnetworks in parallel, periodically aggregates their parameters, and resamples new subnetworks during training. This process identifies high-quality subnetworks ("golden tickets") without requiring post-training procedures such as calibration or Hessian-based recovery. As a result, TwIST enables zero-cost pruning at deployment time while achieving perplexity competitive with state-of-the-art post-training sparsification methods. The benefits are most pronounced under aggressive sparsity (e.g., 50%+), where TwIST significantly outperforms baseline methods; for example, reaching 23.14 PPL compared to 31.64 for the closest prior approach. Unlike unstructured pruning, TwIST produces structured, dense matrices that offer practical inference speedups and memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis
