DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging
Neha Verma, Kenton Murray, Kevin Duh

TL;DR
DOTResize introduces a novel method for compressing large language models by using optimal transport to reproject and merge neurons, preserving useful information and reducing computational costs.
Contribution
It proposes a new neuron width reduction technique based on discrete optimal transport, enhancing model compression beyond traditional pruning methods.
Findings
Achieves measurable reductions in computational cost.
Serves as an effective add-on to existing pruning techniques.
Maintains model performance while reducing size.
Abstract
Structured pruning methods designed for Large Language Models (LLMs) generally focus on identifying and removing the least important components to optimize model size. However, in this work, we question this prevalent approach by instead exploring how to recombine information from structures designated for pruning back into the reduced model. We specifically focus on neuron width reduction, and frame this problem as a Discrete Optimal Transport problem, and propose DOTResize, a novel Transformer compression method that uses optimal transport theory to transform and compress model width. To ensure applicability within the Transformer architecture, we motivate and incorporate necessary entropic regularization and matrix factorization techniques into the transportation maps produced by our method. Unlike pruning-based approaches which discard neurons based on importance measures, DOTResize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
