Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train   Decomposition for Convolutional Neural Networks

Anuar Taskynov; Vladimir Korviakov; Ivan Mazurenko; Yepan Xiong

arXiv:2108.04029·cs.CV·August 10, 2021

Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train Decomposition for Convolutional Neural Networks

Anuar Taskynov, Vladimir Korviakov, Ivan Mazurenko, Yepan Xiong

PDF

Open Access

TL;DR

This paper introduces a hardware-friendly Tensor-Train decomposition method and a one-shot training algorithm called Tensor Yard, enabling efficient acceleration of CNNs like ResNet on specific hardware with minimal accuracy loss.

Contribution

It proposes a novel Tensor-Train decomposition tailored for hardware efficiency and a one-shot training algorithm that optimizes layer decomposition order for CNN acceleration.

Findings

01

ResNet-101 accelerated by 14.6% on Ascend 310 NPU

02

Achieved minimal accuracy drop of 0.5% on ImageNet

03

Demonstrated effectiveness of the method on real hardware

Abstract

Nowadays Deep Learning became widely used in many economic, technical and scientific areas of human interest. It is clear that efficiency of solutions based on Deep Neural Networks should consider not only quality metric for the target task, but also latency and constraints of target platform design should be taken into account. In this paper we present novel hardware-friendly Tensor-Train decomposition implementation for Convolutional Neural Networks together with Tensor Yard - one-shot training algorithm which optimizes an order of decomposition of network layers. These ideas allow to accelerate ResNet models on Ascend 310 NPU devices without significant loss of accuracy. For example we accelerate ResNet-101 by 14.6% with drop by 0.5 of top-1 ImageNet accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Batch Normalization · Residual Connection · Average Pooling · 1x1 Convolution · Residual Block · Global Average Pooling · Kaiming Initialization · Bottleneck Residual Block