GeLaCo: An Evolutionary Approach to Layer Compression
David Ponce, Thierry Etchegoyhen, Javier Del Ser

TL;DR
GeLaCo introduces an evolutionary method for compressing large language models by efficiently exploring the solution space and balancing compression with model quality, outperforming existing methods.
Contribution
It presents the first Pareto frontier for LLM compression using evolutionary search with a module-wise similarity fitness function.
Findings
Outperforms state-of-the-art compression methods
Supports multi-objective optimization for compression and quality
Effectively explores the solution space with population-based search
Abstract
Large Language Models (LLM) have achieved remarkable performance across a large number of tasks, but face critical deployment and usage barriers due to substantial computational requirements. Model compression methods, which aim to reduce model size while preserving its capacity, are an important means to mitigate these issues. Promising approaches along these lines, such as structured pruning, typically require costly empirical search for optimal variants and may run the risk of ignoring better solutions. In this work we introduce GeLaCo, an evolutionary approach to LLM compression via layer collapse. Our approach supports an efficient exploration of the compression solution space via population-based search and a module-wise similarity fitness function capturing attention, feed-forward, and hidden state representations. GeLaCo also supports both single and multi-objective evolutionary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemiconductor Lasers and Optical Devices · Interconnection Networks and Systems
