LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation
Giorgia Bolognesi, Claudio Estatico, Ulderico Fugacci, Isabella Mastroianni, Claudio Muselli, Luca Oneto

TL;DR
LARAG introduces a link-aware retrieval method that leverages hyperlink structures in technical documentation to improve factual accuracy and efficiency in RAG systems.
Contribution
It proposes a novel, lightweight retrieval strategy that uses hyperlink metadata to enhance RAG performance without explicit graph construction.
Findings
LARAG achieves higher BERTScore F1 than baseline RAG.
It retrieves fewer chunks and generates fewer tokens.
LARAG improves answer quality across multiple queries.
Abstract
Retrieval-Augmented Generation (RAG) enhances the factual grounding of Large Language Models by conditioning their outputs on external documents. However, standard embedding-based retrievers treat naturally structured corpora, such as technical manuals, as flat collections of passages, thereby overlooking the hyperlink topology that users rely on when navigating such content. We introduce LARAG (Link-Aware RAG): a lightweight, link-aware retrieval strategy that leverages the author-defined hyperlink structure already present in HTML documentation, encoding hyperlink relations as metadata in the chunk representations and exploiting them to perform a form of graph-like retrieval of locally relevant content. In a benchmark of twenty expert-designed queries over Rulex Platform technical documentation and four prompting strategies, LARAG consistently improves answer quality, achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
