Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference

Sietse Schelpe

arXiv:2605.09990·cs.CL·May 12, 2026

Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference

Sietse Schelpe

PDF

TL;DR

Merlin is a high-throughput, byte-exact deduplication engine that significantly reduces redundant text data in large language model workflows without data loss.

Contribution

It introduces Merlin, a novel SIMD-friendly deduplication system with a secure architecture for efficient large-scale text processing in LLM applications.

Findings

01

Input reduction of 13.9% to over 71% in redundant datasets

02

Achieves processing speeds up to 8.7 GB/s

03

Maintains absolute data fidelity during deduplication

Abstract

Data-intensive applications, ranging from large-scale retrieval systems to advanced data pipelines, are increasingly bottlenecked by the processing of highly redundant text corpora. We present Merlin, a local-first, agnostic, high-throughput deduplication and context optimization engine designed to mitigate these inefficiencies. Utilizing a highly optimized, SIMD-friendly open-addressing flat hash set combined with xxHash3-64, Merlin performs rapid, byte-exact deduplication of text passages and data chunks. While broadly applicable to any text-processing workflow, its impact is particularly pronounced in Large Language Model (LLM) ecosystems, such as Retrieval-Augmented Generation (RAG). Our empirical evaluations demonstrate an input reduction ranging from 13.9% in low-redundancy datasets to over 71% in high-redundancy pipelines, maintaining absolute data fidelity. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.