Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Zhanpeng Zeng, Cole Hawkins, Mingyi Hong, Aston Zhang, Nikolaos, Pappas, Vikas Singh, Shuai Zheng

TL;DR
This paper introduces VCC, a method to efficiently handle ultra long sequences in transformers by prioritizing important tokens, enabling scaling to 128K tokens with improved accuracy and efficiency.
Contribution
The paper proposes VIP-token centric compression (VCC), a novel approach that selectively compresses sequences based on token importance, significantly improving scalability and performance of transformers on ultra long sequences.
Findings
Achieves over 3x efficiency gain on 4K and 16K sequences
Offers competitive or better performance across various tasks
Scales effectively to sequences of 128K tokens or more
Abstract
Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long sequences (e.g., with more than 16K tokens) remains challenging. Applications such as answering questions based on a book or summarizing a scientific article are inefficient or infeasible. Here, we propose to significantly improve the efficiency of Transformers for ultra long sequences, by compressing the sequence into a much smaller representation at each layer. Specifically, by exploiting the fact that in many tasks, only a small subset of special tokens (we call VIP-tokens) are most relevant to the final prediction, we propose a VIP-token centric compression (VCC) scheme which selectively compresses the sequence based on their impact on approximating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Machine Learning in Bioinformatics
