SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards
Yuzhi Yang, Lina Bariah, Yuhuan Lu, Hang Zou, and M\'erouane Debbah

TL;DR
SEM-RAG is a novel retrieval framework that preserves structural logic in telecommunication documents by converting them into typed graphs and compressing with entropy minimization, improving accuracy and efficiency.
Contribution
The paper introduces a layout-aware graph compiler and entropy-based compression for retrieval, specifically tailored for structure-rich telecommunication standards.
Findings
Achieves 94.1% accuracy on TeleQnA
Reaches 93.8% accuracy on ORAN-Bench-13K
Reduces indexing token usage significantly
Abstract
Telecommunication standards pose a unique challenge for retrieval systems, where accuracy depends on semantic relevance as well as on preserving the structural logic embedded in the documents, including structured relationships embedded in tables, conditions, and formulas. When these elements are flattened into text, critical dependencies are lost, leading to unreliable retrieval. In this paper, we present SEM-RAG, an end-to-end retrieval framework built around two design choices. First, a layout-aware compiler converts text, tables, and formulas into typed graph primitives. Each table cell is linked to its row headers, column headers, predicates, and source coordinates, while each formula is converted into an operator graph tied to nearby symbol definitions. Second, the compiled graph is compressed with Structural Entropy Minimization (SEM), which avoids LLM-based bottom-up clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
