Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Abhishek Aich, Yumin Suh, Samuel Schulter, Manmohan Chandraker

TL;DR
This paper introduces PRO-SCALE, a method that progressively reduces token length in transformer encoders for segmentation, significantly lowering computational costs while maintaining performance.
Contribution
PRO-SCALE is a novel strategy that adaptively scales token length across encoder layers, improving efficiency in transformer-based segmentation models.
Findings
52% reduction in encoder GFLOPs without performance loss
27% overall GFLOPs reduction with maintained accuracy
Demonstrated flexibility across different architectural configurations
Abstract
A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses 50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former segmentation architecture to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvancements in Photolithography Techniques · Optical measurement and interference techniques · Industrial Vision Systems and Defect Detection
