SAFE: Improving LLM Systems using Sentence-Level In-generation Attribution

Jo\~ao Eduardo Batista; Emil Vatai; Mohamed Wahib

arXiv:2505.12621·cs.CL·September 25, 2025

SAFE: Improving LLM Systems using Sentence-Level In-generation Attribution

Jo\~ao Eduardo Batista, Emil Vatai, Mohamed Wahib

PDF

Open Access 4 Reviews

TL;DR

SAFE introduces a sentence-level attribution framework for retrieve-augmented generation systems, enhancing the trustworthiness and verifiability of LLM outputs by accurately attributing generated sentences to source documents, especially in scientific contexts.

Contribution

This work presents a novel framework for sentence-level attribution in RAG systems, improving attribution accuracy and enabling better verification of LLM-generated content.

Findings

01

Achieved 95% accuracy in predicting the number of references needed per sentence.

02

Improved attribution accuracy by 2.1-6.0% over existing algorithms.

03

Demonstrated reliable attribution in real-world, large-document scenarios.

Abstract

Large Language Models (LLMs) are increasingly applied in various science domains, yet their broader adoption remains constrained by a critical challenge: the lack of trustworthy, verifiable outputs. Current LLMs often generate answers without reliable source attribution, or worse, with incorrect attributions, posing a barrier to their use in scientific and high-stakes settings, where traceability and accountability are paramount. To be reliable, attribution systems require high accuracy for short-length attribution on retrieved data, i.e., attribution to a sentence within a document rather than the entire document. We propose SAFE, a Sentence-level A ttribution FramEwork for Retrieve-Augmented Generation (RAG) systems that attributes generated sentences during generation. This allows users to verify sentences as they read them and correct the model when the attribution indicates the…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 3

Strengths

The proposal of an "in-generation" (sentence-by-sentence) attribution system is a novel and highly practical contribution. This design directly addresses the user-experience bottleneck of post-hoc, document-level verification and empowers users to correct hallucinations in real-time. The use of a lightweight Pre-attribution (PA) classifier first predicts the required number of quotes is an efficient design. As shown in Table 5, this two-step process improves the accuracy of the final attributio

Weaknesses

The two-step design creates its own limitations. As the paper notes, the system's ability to identify sentences requiring zero attribution is weak, as most attributors are not optimized for "no match" detection. The system is thus forced to conservatively find a quote, even for common knowledge. The framework requires a dedicated dataset (like HAGRID-Clean) with sentence-level labels (0, 1, 2+) to train the Pre-attribution Classifier. This multi-stage pipeline, which requires pre-training a cla

Reviewer 02Rating 2Confidence 4

Strengths

1. **Well written paper:** The paper is very thorough and easy to follow and reproduce if wanted. 2. **Data Contribution:** The creation of the HAGRID-Clean dataset is a good resource for the community. The authors identified that existing attribution datasets suffer from noise, such as "over-referencing" and "under-referencing". By manually reviewing and re-labeling the data based on the ideal number of references, they created a high-quality benchmark. 3. **Strong Empirical Validation:** The f

Weaknesses

1. **Misleading "In-Generation" Claim and Outdated Baselines:** The paper's central claim of "in-generation" attribution is misleading. The framework operates post-hoc on a per-sentence basis; it classifies and attributes a fully generated sentence. This is fundamentally different from true in-generation attribution models (e.g., Gao et al., 2023; RARR) that co-generate text and citation markers. By not comparing against this dominant and more recent line of research (listed a few below), the pa

Reviewer 03Rating 0Confidence 4

Strengths

1. Open source framework that is very light weight 2. HAGRID-Clean Dataset: The authors provide a manually cleaned version of the HAGRID attribution dataset to address issues with noisy data and inconsistent referencing in existing benchmarks. (not sure this dataset will be released, though)

Weaknesses

1. Lack comparison to various methods working on grounded generation, for example directly use NLI model [1], or more advanced methods([2], [3], just to named a few.) 2. The evaluation and comparison is on the same distribution, showing little generalization testing 3. The real-world testing section do not have evaluation results [1] Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. Enabling large language models to generatetext with citations. In Proceedings of the 2023 Conference on Empiri

Reviewer 04Rating 2Confidence 3

Strengths

* The paper presents a clear and practical engineering work. * The provided code repo is easy to read and use, showing good reproducibility. * The framework can be effectively applied on low-end machines, such as personal devices that only have API access to LLMs. * The authors also contribute a manually cleaned version of the HAGRID dataset.

Weaknesses

* **The proposed framework lacks novelty.** It mainly combines two steps, pre-attribution (predicting the number of citations) and attribution (assigning references). This limits the paper’s contribution to an engineering solution instead of a innovation. * **The technical methods used are quite standard and outdated.** For pre-attribution, the authors test Random Forest (RF), XGBoost (XGB), Multi-Layer Perceptron (MLP), and TabularNet (TN), all of which are common lightweight models. The attrib

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education