Jasper-Token-Compression-600M Technical Report
Dun Zhang, Ziyang Zeng, Yudong Zhou, Shuyang Lu

TL;DR
This report introduces Jasper-Token-Compression-600M, a bilingual model that combines contrastive learning and a novel convolution-based token compression to improve efficiency and performance in text representation.
Contribution
The paper presents a new token compression module and training methodology that extend distillation techniques to bilingual models, enhancing efficiency and robustness.
Findings
Achieves higher inference efficiency than a 0.6B model
Performs comparably to an 8B model in quality
Introduces a dynamic compression rate during training
Abstract
This technical report presents the training methodology and evaluation results of the open-source Jasper-Token-Compression-600M model, released in November 2025. Building on previous distillation-based recipes from the English Stella and Jasper models, we successfully extend this approach to a bilingual (English and Chinese) domain, further enhancing model performance through the incorporation of contrastive learning. A key innovation of our model is the introduction of a one-dimensional convolution-based token compression module. We dynamically adjust the compression rate during training, enabling the model to learn more robust and efficient compressed text representations. By combining knowledge distillation with token compression techniques, we achieve significant improvements in both embedding quality and inference efficiency. Our model performs with higher efficiency than a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Big Data and Digital Economy · Algorithms and Data Compression
