Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices

Liu Zai; Iraklis Klampanos

arXiv:2601.05833·cs.CL·May 4, 2026

Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices

Liu Zai, Iraklis Klampanos

PDF

TL;DR

Peek2 is a new byte-level pretokernizer that replaces regex-based methods, offering linear time complexity and improved throughput for edge device inference without sacrificing accuracy.

Contribution

It introduces a regex-free, linear-time pretokernizer compatible with popular models, optimized for edge inference scenarios.

Findings

01

Increases microbenchmark throughput by up to 2.48x.

02

Achieves a 1.14x overall throughput improvement.

03

Provides identical tokenization results as baseline methods.

Abstract

Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers, yet little work has been done to optimize it for edge-side inference. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-3, LLaMa-3, and Qwen-2.5. After breaking down and analyzing the logic of the original cl100k pretokenizer, we introduced a new pretokenization algorithm with linear time complexity and constant, trivial memory usage, suited for edge scenarios. Test results show that it increases microbenchmarking throughput by up to $2.48 \times$ and delivers a $1.14 \times$ improvement in overall throughput across the entire Byte-level BPE encoding process, depending on the dataset, while providing identical results as the baseline Regex-based tokenizer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.