Token Transformer: Can class token help window-based transformer build   better long-range interactions?

Jiawei Mao; Yuanqi Chang; Xuesong Yin

arXiv:2211.06083·cs.CV·January 4, 2023

Token Transformer: Can class token help window-based transformer build better long-range interactions?

Jiawei Mao, Yuanqi Chang, Xuesong Yin

PDF

Open Access

TL;DR

The paper introduces Token Transformer (TT), which enhances window-based transformers with class tokens for improved long-range interactions, achieving competitive results efficiently.

Contribution

The novel CLS Attention mechanism and Feature Inheritance Module enable better long-range modeling while maintaining hierarchical structure.

Findings

01

TT achieves state-of-the-art accuracy with fewer parameters.

02

CLS tokens improve long-range interaction in window-based transformers.

03

TT performs well on image classification and downstream tasks.

Abstract

Compared with the vanilla transformer, the window-based transformer offers a better trade-off between accuracy and efficiency. Although the window-based transformer has made great progress, its long-range modeling capabilities are limited due to the size of the local window and the window connection scheme. To address this problem, we propose a novel Token Transformer (TT). The core mechanism of TT is the addition of a Class (CLS) token for summarizing window information in each local window. We refer to this type of token interaction as CLS Attention. These CLS tokens will interact spatially with the tokens in each window to enable long-range modeling. In order to preserve the hierarchical design of the window-based transformer, we designed Feature Inheritance Module (FIM) in each phase of TT to deliver the local window information from the previous phase to the CLS token in the next…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Linear Layer · Softmax · Adam · Absolute Position Encodings · Byte Pair Encoding