ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
Chaoyu Li, Yogesh Kulkarni, Pooyan Fazli

TL;DR
ReGATE is an adaptive token pruning method that accelerates multimodal large language model training by dynamically selecting informative tokens, reducing computation and token usage while maintaining or improving accuracy.
Contribution
ReGATE introduces a novel teacher-guided token elision technique that significantly speeds up training without changing model architecture.
Findings
ReGATE achieves up to 2× faster training on MVBench with 38% tokens used.
Extended training with ReGATE surpasses baseline accuracy across multiple benchmarks.
Total token usage is reduced by over 41% with ReGATE.
Abstract
The computational cost of training multimodal large language models (MLLMs) grows rapidly with the number of processed tokens. Existing efficiency methods mainly target inference via token reduction or merging, offering limited benefits during training. We introduce ReGATE (Reference-Guided Adaptive Token Elision), an adaptive token pruning method for accelerating MLLM training. ReGATE adopts a teacher-student framework, in which a frozen teacher LLM provides per-token guidance losses that are fused with an exponential moving average of the student's difficulty estimates. This adaptive scoring mechanism dynamically selects informative tokens while skipping redundant ones in the forward pass, substantially reducing computation without altering the model architecture. Across three representative MLLMs, ReGATE matches the peak accuracy of standard training on MVBench up to 2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
