Fast Tensorization of Neural Networks via Slice-wise Feature Distillation
Safa Hamreras, Sukhbinder Singh, Rom\'an Or\'us

TL;DR
This paper introduces a scalable, slice-wise tensorization method for neural network compression that improves accuracy and efficiency over traditional global approaches, demonstrated on ResNet-34 and GPT-2 XL.
Contribution
The proposed framework decomposes networks into slices for independent tensorization, enabling faster, more accurate compression suitable for large-scale models.
Findings
Achieves near-lossless compression on ResNet-34 with faster optimization.
Demonstrates scalability and effectiveness on GPT-2 XL.
Reduces data requirements compared to global tensorization methods.
Abstract
We propose a scalable tensorization framework for neural network compression based on slice-wise feature distillation. Unlike conventional tensor decomposition methods that rely on costly global finetuning, our approach decomposes the network into slices consisting of either individual layers or blocks (e.g., convolutional layers or MLPs), or small groups of consecutive layers, and tensorizes each slice independently to reproduce the intermediate representations of the original pretrained model. This modular strategy improves accuracy recovery, reduces data requirements, and enables efficient parallel optimization. Experiments on ResNet-34 show significant gains over conventional global tensorization, achieving near-lossless compression at moderate compression rates with faster optimization. Results on GPT-2 XL further demonstrate the scalability of the method and its applicability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
