Krony-PT: GPT2 compressed with Kronecker Products
Mohamed Ayoub Ben Ayad, Jelena Mitrovic, and Michael Granitzer

TL;DR
Krony-PT introduces a novel Kronecker product-based compression method for GPT-2, effectively reducing model size while maintaining or improving performance on language modeling tasks.
Contribution
The paper presents a new Kronecker product compression technique for GPT-2, including a modified Van Loan decomposition and pruning-based initialization, achieving high compression ratios with competitive accuracy.
Findings
Compressed GPT-2 models range from 80M to 96M parameters.
The 81M Krony-PT model outperforms DistilGPT2 on next-token prediction.
Kronecker-based models show competitive performance with larger models.
Abstract
We introduce Krony-PT, a compression technique for GPT-2 based on Kronecker products. We specifically target the feed-forward weights of each transformer block, and systematically compress the feed-forward layer matrices to various degrees. We introduce a modified Van Loan decomposition to initialize new Kronecker factors, and also propose a new pruning-based initialization technique. Our method compresses the original 124M-parameter GPT-2 to various smaller models, ranging from 80M to 96M. Our 81M model variant outperforms DistilGPT2 on next-token prediction across all standard language modeling datasets, and shows competitive or comparable performance with significantly larger Kronecker-based compressions of GPT-2.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Parallel Computing and Optimization Techniques · Mathematics, Computing, and Information Processing
