SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

Mike Rainey; Umut Acar; Muhammed Sezer

arXiv:2604.20849·cs.IR·April 24, 2026

SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

Mike Rainey, Umut Acar, Muhammed Sezer

PDF

TL;DR

This paper introduces SPIRE, a structure-preserving retrieval system for semi-structured documents like HTML, enhancing evidence retrieval by maintaining document structure and providing more interpretable, citation-ready results.

Contribution

SPIRE presents a novel, structure-aware retrieval pipeline that operates over tree-structured documents, improving evidence quality and interpretability over traditional linearized methods.

Findings

01

Higher-quality, diverse citations achieved with structure preservation.

02

Outperforms passage-based baselines on HTML question-answering benchmarks.

03

Maintains scalability while enhancing interpretability.

Abstract

Retrieval-augmented generation over semi-structured sources such as HTML is constrained by a mismatch between document structure and the flat, sequence-based interfaces of today's embedding and generative models. Retrieval pipelines often linearize documents into fixed-size chunks before indexing, which obscures section structure, lists, and tables, and makes it difficult to return small, citation-ready evidence without losing the surrounding context that makes it interpretable. We present a structure-aware retrieval pipeline that operates over tree-structured documents. The core idea is to represent candidates as subdocuments: precise, addressable selections that preserve structural identity while deferring the choice of surrounding context. We define a small set of document primitives--paths and path sets, subdocument extraction by pruning, and two contextualization mechanisms.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.