Sliced Recursive Transformer

Zhiqiang Shen; Zechun Liu; Eric Xing

arXiv:2111.05297·cs.CV·July 27, 2022

Sliced Recursive Transformer

Zhiqiang Shen, Zechun Liu, Eric Xing

PDF

Open Access 1 Repo

TL;DR

The paper introduces Sliced Recursive Transformer (SReT), a parameter-efficient vision transformer that shares weights across layers, improves accuracy, reduces computational costs, and enables scalable deep models with minimal overhead.

Contribution

It proposes a novel weight sharing recursive structure with sliced group self-attentions, enhancing efficiency and scalability of vision transformers without extra parameters.

Findings

01

Achieves ~2% accuracy gain on ImageNet-1K with recursive weight sharing.

02

Reduces computational cost by 10-30% using sliced group self-attentions.

03

Enables construction of very deep transformers with over 100 shared layers.

Abstract

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across the depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using naive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimal computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), a novel and parameter-efficient vision transformer design that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szq0214/sret
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image Enhancement Techniques · CCD and CMOS Imaging Sensors

MethodsAttention Is All You Need · Linear Layer · Vision Transformer · Multi-Head Attention · Dropout · Layer Normalization · Residual Connection · Dense Connections · Softmax · Absolute Position Encodings