From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
Mingcheng Zhu, Zhiyao Luo, Yu Liu, Tingting Zhu

TL;DR
This paper introduces MedTPE, a lossless token compression method for large language models in clinical prediction, reducing input size and inference time without performance loss.
Contribution
MedTPE extends tokenization by merging co-occurring medical token pairs, enabling efficient, lossless compression with minimal fine-tuning.
Findings
Reduces token length by up to 31% in clinical datasets.
Decreases inference latency by 34-63%.
Maintains or improves prediction accuracy.
Abstract
By processing electronic health records (EHRs) as natural language sequences, large language models (LLMs) have shown potential in clinical prediction tasks such as mortality prediction and phenotyping. However, longitudinal or highly frequent EHRs often yield excessively long token sequences that result in high computational costs and even reduced performance. Existing solutions either add modules for compression or remove less important tokens, which introduce additional inference latency or risk losing clinical information. To achieve lossless compression of token sequences without additional cost or loss of performance, we propose Medical Token-Pair Encoding (MedTPE), a layered method that extends standard tokenisation for EHR sequences. MedTPE merges frequently co-occurring medical token pairs into composite tokens, providing lossless compression while preserving the computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
