ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large   Language Model

Shuhao Gu; Mengdi Zhao; Bowen Zhang; Liangdong Wang; Jijie Li; Guang; Liu

arXiv:2410.04335·cs.CL·October 8, 2024

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

Shuhao Gu, Mengdi Zhao, Bowen Zhang, Liangdong Wang, Jijie Li, Guang, Liu

PDF

Open Access

TL;DR

This paper introduces ReTok, a method that replaces tokenizers in large language models to improve efficiency and decoding speed for long texts without sacrificing performance.

Contribution

The paper presents a novel tokenizer replacement approach that maintains model performance while significantly enhancing decoding speed for long inputs.

Findings

01

Maintains model performance after tokenizer replacement

02

Significantly improves decoding speed for long texts

03

Applicable across different large language models

Abstract

Tokenizer is an essential component for large language models (LLMs), and a tokenizer with a high compression rate can improve the model's representation and processing efficiency. However, the tokenizer cannot ensure high compression rate in all scenarios, and an increase in the average input and output lengths will increases the training and inference costs of the model. Therefore, it is crucial to find ways to improve the model's efficiency with minimal cost while maintaining the model's performance. In this work, we propose a method to improve model representation and processing efficiency by replacing the tokenizers of LLMs. We propose replacing and reinitializing the parameters of the model's input and output layers with the parameters of the original model, and training these parameters while keeping other parameters fixed. We conducted experiments on different LLMs, and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings