Improved Image Classification with Token Fusion

Keong Hun Choi; Jin Woo Kim; Yao Wang; Jong Eun Ha

arXiv:2208.09183·cs.CV·August 22, 2022

Improved Image Classification with Token Fusion

Keong Hun Choi, Jin Woo Kim, Yao Wang, Jong Eun Ha

PDF

Open Access

TL;DR

This paper introduces a novel image classification approach that fuses CNN and transformer features through three different token fusion methods, achieving superior performance on ImageNet 1k.

Contribution

It presents three new token fusion techniques combining CNN and transformer features for improved image classification.

Findings

01

Achieved state-of-the-art accuracy on ImageNet 1k

02

Demonstrated effectiveness of multi-level token fusion methods

03

Compared fusion strategies and identified the most effective approach

Abstract

In this paper, we propose a method using the fusion of CNN and transformer structure to improve image classification performance. In the case of CNN, information about a local area on an image can be extracted well, but there is a limit to the extraction of global information. On the other hand, the transformer has an advantage in relatively global extraction, but has a disadvantage in that it requires a lot of memory for local feature value extraction. In the case of an image, it is converted into a feature map through CNN, and each feature map's pixel is considered a token. At the same time, the image is divided into patch areas and then fused with the transformer method that views them as tokens. For the fusion of tokens with two different characteristics, we propose three methods: (1) late token fusion with parallel structure, (2) early token fusion, (3) token fusion in a layer by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction