Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices
Liu Zai, Iraklis Klampanos

TL;DR
Peek2 is a new byte-level pretokernizer that replaces regex-based methods, offering linear time complexity and improved throughput for edge device inference without sacrificing accuracy.
Contribution
It introduces a regex-free, linear-time pretokernizer compatible with popular models, optimized for edge inference scenarios.
Findings
Increases microbenchmark throughput by up to 2.48x.
Achieves a 1.14x overall throughput improvement.
Provides identical tokenization results as baseline methods.
Abstract
Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers, yet little work has been done to optimize it for edge-side inference. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-3, LLaMa-3, and Qwen-2.5. After breaking down and analyzing the logic of the original cl100k pretokenizer, we introduced a new pretokenization algorithm with linear time complexity and constant, trivial memory usage, suited for edge scenarios. Test results show that it increases microbenchmarking throughput by up to and delivers a improvement in overall throughput across the entire Byte-level BPE encoding process, depending on the dataset, while providing identical results as the baseline Regex-based tokenizer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
