Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems
Nick Loghmani

TL;DR
This paper introduces a formal framework for understanding semantic entanglement in vector representations used in RAG systems, and proposes a preprocessing pipeline to improve retrieval precision by reducing entanglement.
Contribution
It formalizes semantic entanglement with an Entanglement Index and presents the Semantic Disentanglement Pipeline (SDP) with context-conditioned preprocessing and feedback mechanisms.
Findings
Top-K retrieval precision improved from 32% to 82% with SDP.
Mean Entanglement Index decreased from 0.71 to 0.14 after applying SDP.
Semantic entanglement captures a distinct preprocessing failure mode in RAG systems.
Abstract
Retrieval-Augmented Generation (RAG) systems depend on the geometric properties of vector representations to retrieve contextually appropriate evidence. When source documents interleave multiple topics within contiguous text, standard vectorization produces embedding spaces in which semantically distinct content occupies overlapping neighborhoods. We term this condition semantic entanglement. We formalize entanglement as a model-relative measure of cross-topic overlap in embedding space and define an Entanglement Index (EI) as a quantitative proxy. We argue that higher EI constrains attainable Top-K retrieval precision under cosine similarity retrieval. To address this, we introduce the Semantic Disentanglement Pipeline (SDP), a four-stage preprocessing framework that restructures documents prior to embedding. We further propose context-conditioned preprocessing, in which document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
