TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic   Token Mixer for Visual Recognition

Meng Lou; Shu Zhang; Hong-Yu Zhou; Sibei Yang; Chuan Wu; Yizhou Yu

arXiv:2310.19380·cs.CV·April 28, 2025·35 cites

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

Meng Lou, Shu Zhang, Hong-Yu Zhou, Sibei Yang, Chuan Wu, Yizhou Yu

PDF

Open Access 1 Repo

TL;DR

TransXNet introduces a dual dynamic token mixer that combines global and local adaptive features, enhancing visual recognition performance with a lightweight hybrid CNN-Transformer architecture.

Contribution

The paper proposes D-Mixer, a novel input-dependent token mixer, and TransXNet, a hybrid backbone that improves representation capacity and efficiency in vision tasks.

Findings

01

TransXNet surpasses Swin-T in ImageNet-1K accuracy with less computation.

02

TransXNet achieves 83.8% and 84.6% top-1 accuracy on ImageNet-1K for small and base models.

03

The architecture outperforms state-of-the-art methods in dense prediction tasks.

Abstract

Recent studies have integrated convolutions into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it from dynamically adapting to input variations, resulting in a representation discrepancy between convolution and self-attention as the latter computes attention maps dynamically. Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels. These two limitations result in a sub-optimal representation capacity of the entire network. To find a solution, we propose a lightweight Dual Dynamic Token Mixer (D-Mixer) to simultaneously learn global and local dynamics via computing input-dependent global and local aggregation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lmmmeng/transxnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification