Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train Decomposition for Convolutional Neural Networks
Anuar Taskynov, Vladimir Korviakov, Ivan Mazurenko, Yepan Xiong

TL;DR
This paper introduces a hardware-friendly Tensor-Train decomposition method and a one-shot training algorithm called Tensor Yard, enabling efficient acceleration of CNNs like ResNet on specific hardware with minimal accuracy loss.
Contribution
It proposes a novel Tensor-Train decomposition tailored for hardware efficiency and a one-shot training algorithm that optimizes layer decomposition order for CNN acceleration.
Findings
ResNet-101 accelerated by 14.6% on Ascend 310 NPU
Achieved minimal accuracy drop of 0.5% on ImageNet
Demonstrated effectiveness of the method on real hardware
Abstract
Nowadays Deep Learning became widely used in many economic, technical and scientific areas of human interest. It is clear that efficiency of solutions based on Deep Neural Networks should consider not only quality metric for the target task, but also latency and constraints of target platform design should be taken into account. In this paper we present novel hardware-friendly Tensor-Train decomposition implementation for Convolutional Neural Networks together with Tensor Yard - one-shot training algorithm which optimizes an order of decomposition of network layers. These ideas allow to accelerate ResNet models on Ascend 310 NPU devices without significant loss of accuracy. For example we accelerate ResNet-101 by 14.6% with drop by 0.5 of top-1 ImageNet accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Batch Normalization · Residual Connection · Average Pooling · 1x1 Convolution · Residual Block · Global Average Pooling · Kaiming Initialization · Bottleneck Residual Block
