MPM: Mutual Pair Merging for Efficient Vision Transformers
Simon Rav\'e, Pejman Rasti, David Rousseau

TL;DR
This paper introduces Mutual Pair Merging (MPM), a training-free token aggregation method that accelerates vision transformers for segmentation by reducing latency and increasing throughput with minimal accuracy loss.
Contribution
MPM is a simple, reconstruction-aware, training-free token merging technique that improves end-to-end latency and throughput in vision transformer segmentation tasks.
Findings
MPM reduces per-image latency by up to 60% on Raspberry Pi 5.
MPM increases throughput by up to 20% on NVIDIA H100 with FlashAttention-2.
MPM maintains mIoU drop below 3% while improving speed and efficiency.
Abstract
Decreasing sequence length is a common way to accelerate transformers, but prior token reduction work often targets classification and reports proxy metrics rather than end-to-end latency. For semantic segmentation, token reduction is further constrained by the need to reconstruct dense, pixel-aligned features, and on modern accelerators the overhead of computing merge maps can erase expected gains. We propose Mutual Pair Merging (MPM), a training-free token aggregation module that forms mutual nearest-neighbor pairs in cosine space, averages each pair, and records a merge map enabling a gather-based reconstruction before the decoder so that existing segmentation heads can be used unchanged. MPM introduces no learned parameters and no continuous compression knob (no keep-rate or threshold). The speed-accuracy trade-off is set by a discrete insertion schedule. We benchmark end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
