Krony-PT: GPT2 compressed with Kronecker Products

Mohamed Ayoub Ben Ayad; Jelena Mitrovic; and Michael Granitzer

arXiv:2412.12351·cs.LG·October 2, 2025

Krony-PT: GPT2 compressed with Kronecker Products

Mohamed Ayoub Ben Ayad, Jelena Mitrovic, and Michael Granitzer

PDF

Open Access

TL;DR

Krony-PT introduces a novel Kronecker product-based compression method for GPT-2, effectively reducing model size while maintaining or improving performance on language modeling tasks.

Contribution

The paper presents a new Kronecker product compression technique for GPT-2, including a modified Van Loan decomposition and pruning-based initialization, achieving high compression ratios with competitive accuracy.

Findings

01

Compressed GPT-2 models range from 80M to 96M parameters.

02

The 81M Krony-PT model outperforms DistilGPT2 on next-token prediction.

03

Kronecker-based models show competitive performance with larger models.

Abstract

We introduce Krony-PT, a compression technique for GPT-2 based on Kronecker products. We specifically target the feed-forward weights of each transformer block, and systematically compress the feed-forward layer matrices to various degrees. We introduce a modified Van Loan decomposition to initialize new Kronecker factors, and also propose a new pruning-based initialization technique. Our method compresses the original 124M-parameter GPT-2 to various smaller models, ranging from 80M to 96M. Our 81M model variant outperforms DistilGPT2 on next-token prediction across all standard language modeling datasets, and shows competitive or comparable performance with significantly larger Kronecker-based compressions of GPT-2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Parallel Computing and Optimization Techniques · Mathematics, Computing, and Information Processing