M\'as contexto no es mejor. Paradoja de la diluci\'on vectorial en RAG corporativos
Alex Dantart

TL;DR
This paper investigates the impact of summary injection ratios in RAG systems, revealing a trade-off where moderate levels improve recall but excessive injection diminishes precision, and proposes a theoretical method to optimize this ratio.
Contribution
It introduces a theoretical framework to determine the optimal injection ratio in RAG, balancing recall and precision by analyzing the vector dilution effect.
Findings
Moderate injection increases recall by 18%.
Exceeding CIR > 0.4 reduces precision by 22%.
Proposes a theoretical model for optimal injection ratio.
Abstract
T\'ecnicas recientes de "Contextualized Chunking" inyectan res\'umenes para mejorar el contexto en RAG, pero introducen una "diluci\'on vectorial" que opaca el contenido local. Evaluando distintos ratios de inyecci\'on, demostramos una curva en "U invertida": una inyecci\'on moderada mejora el "Recall" (+18%), pero superar un umbral cr\'itico (CIR > 0.4) reduce la precisi\'on en un 22% para consultas espec\'ificas. Proponemos un marco te\'orico para calcular el ratio \'optimo de inyecci\'on. -- Recent "Contextualized Chunking" techniques inject summaries to improve RAG context but introduce "vector dilution" drowning out local content. Evaluating various injection ratios, we demonstrate an "inverted U" curve: moderate injection boosts Recall (+18%), but exceeding a critical threshold (CIR > 0.4) drops precision by 22% for specific queries. We propose a theoretical framework to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Information Retrieval and Search Behavior · Data Visualization and Analytics
