TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training

Michael Menezes; Barbara Su; Xinze Feng; Yehya Farhat; Hamza Shili; Anastasios Kyrillidis

arXiv:2511.03983·cs.LG·November 7, 2025

TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training

Michael Menezes, Barbara Su, Xinze Feng, Yehya Farhat, Hamza Shili, Anastasios Kyrillidis

PDF

Open Access

TL;DR

TwIST is a distributed training framework that efficiently sparsifies large language models by training multiple subnetworks in parallel, enabling zero-cost pruning with competitive perplexity and practical inference speedups.

Contribution

TwIST introduces a novel parallel subnetwork training method that finds high-quality sparse models during training without post-processing, improving efficiency and deployment practicality.

Findings

01

Achieves perplexity of 23.14 with 50% sparsity, outperforming prior methods.

02

Produces structured, dense matrices suitable for hardware acceleration.

03

Enables zero-cost pruning with no additional fine-tuning or recovery steps.

Abstract

We introduce TwIST, a distributed training framework for efficient large language model (LLM) sparsification. TwIST trains multiple subnetworks in parallel, periodically aggregates their parameters, and resamples new subnetworks during training. This process identifies high-quality subnetworks ("golden tickets") without requiring post-training procedures such as calibration or Hessian-based recovery. As a result, TwIST enables zero-cost pruning at deployment time while achieving perplexity competitive with state-of-the-art post-training sparsification methods. The benefits are most pronounced under aggressive sparsity (e.g., 50%+), where TwIST significantly outperforms baseline methods; for example, reaching 23.14 PPL compared to 31.64 for the closest prior approach. Unlike unstructured pruning, TwIST produces structured, dense matrices that offer practical inference speedups and memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis