ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
Lingfeng Wang, Hualing Lin, Senda Chen, Tao Wang, Changxu Cheng, Yangyang Zhong, Dong Zheng, and Wuyue Zhao

TL;DR
ALTo introduces an adaptive-length tokenizer for autoregressive mask generation, enabling multimodal models to allocate attention dynamically based on visual complexity, improving segmentation performance and efficiency.
Contribution
The paper presents a novel adaptive-length tokenizer and a seamless integration method into multimodal large language models, enhancing mask quality and computational efficiency.
Findings
Achieves state-of-the-art segmentation results
Demonstrates improved efficiency with adaptive token costs
Provides a flexible framework for mask quality and efficiency trade-offs
Abstract
While humans effortlessly draw visual objects and shapes by adaptively allocating attention based on their complexity, existing multimodal large language models (MLLMs) remain constrained by rigid token representations. Bridging this gap, we propose ALTo, an adaptive length tokenizer for autoregressive mask generation. To achieve this, a novel token length predictor is designed, along with a length regularization term and a differentiable token chunking strategy. We further build ALToLLM that seamlessly integrates ALTo into MLLM. Preferences on the trade-offs between mask quality and efficiency is implemented by group relative policy optimization (GRPO). Experiments demonstrate that ALToLLM achieves state-of-the-art performance with adaptive token cost on popular segmentation benchmarks. Code and models are released at https://github.com/yayafengzi/ALToLLM.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques
MethodsSoftmax · Attention Is All You Need
