Bridging OLAP and RAG: A Multidimensional Approach to the Design of Corpus Partitioning
Dario Maio, Stefano Rizzi

TL;DR
This paper introduces the Dimensional Fact Model (DFM), a conceptual framework that combines semantic clustering and multidimensional partitioning to improve the design, robustness, and explainability of large-scale Retrieval-Augmented Generation (RAG) systems.
Contribution
It proposes the DFM as a novel approach to guide corpus partitioning in RAG, bridging OLAP multidimensional modeling with modern retrieval architectures.
Findings
DFM enables hierarchical routing and fallback strategies.
Transforms retrieval from a black-box similarity to a deterministic workflow.
Supports robust retrieval despite incomplete metadata.
Abstract
Retrieval-Augmented Generation (RAG) systems are increasingly deployed on large-scale document collections, often comprising millions of documents and tens of millions of text chunks. In industrial-scale retrieval platforms, scalability is typically addressed through horizontal sharding and a combination of Approximate Nearest-Neighbor search, hybrid indexing, and optimized metadata filtering. Although effective from an efficiency perspective, these mechanisms rely on bottom-up, similarity-driven organization and lack a conceptual rationale for corpus partitioning. In this paper, we claim that the design of large-scale RAG systems may benefit from the combination of two orthogonal strategies: semantic clustering, which optimizes locality in embedding space, and multidimensional partitioning, which governs where retrieval should occur based on conceptual dimensions such as time and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Biomedical Text Mining and Ontologies · Semantic Web and Ontologies
