ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic   Segmentation with Plain Vision Transformers

Narges Norouzi; Svetlana Orlova; Daan de Geus; Gijs Dubbelman

arXiv:2406.09936·cs.CV·June 17, 2024

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

Narges Norouzi, Svetlana Orlova, Daan de Geus, Gijs Dubbelman

PDF

Open Access 1 Repo

TL;DR

ALGM introduces a two-stage token merging strategy for plain Vision Transformers in semantic segmentation, significantly boosting efficiency and accuracy by adaptively merging tokens locally and globally during inference.

Contribution

This paper proposes ALGM, a novel adaptive token merging method that improves semantic segmentation efficiency and accuracy in plain Vision Transformers through local and global merging stages.

Findings

01

Up to 100% throughput improvement.

02

Mean IoU increased by up to +1.1.

03

Better trade-off between segmentation quality and efficiency.

Abstract

This work presents Adaptive Local-then-Global Merging (ALGM), a token reduction method for semantic segmentation networks that use plain Vision Transformers. ALGM merges tokens in two stages: (1) In the first network layer, it merges similar tokens within a small local window and (2) halfway through the network, it merges similar tokens across the entire image. This is motivated by an analysis in which we found that, in those situations, tokens with a high cosine similarity can likely be merged without a drop in segmentation quality. With extensive experiments across multiple datasets and network configurations, we show that ALGM not only significantly improves the throughput by up to 100%, but can also enhance the mean IoU by up to +1.1, thereby achieving a better trade-off between segmentation quality and efficiency than existing methods. Moreover, our approach is adaptive during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tue-mps/algm-segmenter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications