Geometry-aware training of factorized layers in tensor Tucker format
Emanuele Zangrando, Steffen Schotth\"ofer, Gianluca Ceruti, Jonas Kusch, Francesco Tudisco

TL;DR
This paper presents a novel geometry-aware training method for factorized neural network layers using Tucker decomposition, enabling dynamic rank adjustment, improved training efficiency, and competitive performance.
Contribution
It introduces a Tucker-based layer factorization training approach that is initialization-insensitive and dynamically updates ranks, with theoretical guarantees and practical benefits.
Findings
Achieves high compression rates during training.
Maintains or improves performance compared to full models.
Provides theoretical convergence and approximation guarantees.
Abstract
Reducing parameter redundancies in neural network architectures is crucial for achieving feasible computational and memory requirements during training and inference phases. Given its easy implementation and flexibility, one promising approach is layer factorization, which reshapes weight tensors into a matrix format and parameterizes them as the product of two small rank matrices. However, this approach typically requires an initial full-model warm-up phase, prior knowledge of a feasible rank, and it is sensitive to parameter initialization. In this work, we introduce a novel approach to train the factors of a Tucker decomposition of the weight tensors. Our training proposal proves to be optimal in locally approximating the original unfactorized dynamics independently of the initialization. Furthermore, the rank of each mode is dynamically updated during training. We provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTensor decomposition and applications · Model Reduction and Neural Networks · Computational Physics and Python Applications
Methodsfail · Pruning · TuckER · Focus
