Topo-RAG: Topology-aware retrieval for hybrid text-table documents
Alex Dantart, Marco K\'ovacs-Navarro

TL;DR
Topo-RAG introduces a topology-aware retrieval framework that preserves the structural relationships in hybrid text-table documents, significantly improving retrieval performance over linearization methods.
Contribution
It proposes a dual architecture that separately processes narrative and tabular data, maintaining their topology for more effective retrieval in complex enterprise datasets.
Findings
Achieves 18.4% improvement in nDCG@10 on hybrid queries
Demonstrates the importance of preserving data topology in retrieval
Outperforms linearization-based retrieval methods
Abstract
In enterprise datasets, documents are rarely pure. They are not just text, nor just numbers; they are a complex amalgam of narrative and structure. Current Retrieval-Augmented Generation (RAG) systems have attempted to address this complexity with a blunt tool: linearization. We convert rich, multidimensional tables into simple Markdown-style text strings, hoping that an embedding model will capture the geometry of a spreadsheet in a single vector. But it has already been shown that this is mathematically insufficient. This work presents Topo-RAG, a framework that challenges the assumption that "everything is text". We propose a dual architecture that respects the topology of the data: we route fluid narrative through traditional dense retrievers, while tabular structures are processed by a Cell-Aware Late Interaction mechanism, preserving their spatial relationships. Evaluated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Data Visualization and Analytics · Image Retrieval and Classification Techniques
