Attribution in Scientific Literature: New Benchmark and Methods

Yash Saxena; Deepa Tilwani; Ali Mohammadi; Edward Raff; Amit Sheth,; Srinivasan Parthasarathy; Manas Gaur

arXiv:2405.02228·cs.CL·April 14, 2025·2 cites

Attribution in Scientific Literature: New Benchmark and Methods

Yash Saxena, Deepa Tilwani, Ali Mohammadi, Edward Raff, Amit Sheth,, Srinivasan Parthasarathy, Manas Gaur

PDF

Open Access

TL;DR

This paper introduces REASONS, a new benchmark dataset for scientific citation attribution, and evaluates LLMs' performance and hallucination issues, proposing metadata augmentation and retrieval techniques to improve reliability.

Contribution

The paper presents REASONS, a comprehensive dataset with annotation across scientific domains, and evaluates methods to reduce hallucinations in LLM-based citation tasks.

Findings

01

Top-tier LLMs achieve high sentence attribution accuracy.

02

Metadata augmentation reduces hallucination rates across tasks.

03

Retrieval-augmented generation improves indirect query performance.

Abstract

Large language models (LLMs) present a promising yet challenging frontier for automated source citation in scientific communication. Previous approaches to citation generation have been limited by citation ambiguity and LLM overgeneralization. We introduce REASONS, a novel dataset with sentence-level annotations across 12 scientific domains from arXiv. Our evaluation framework covers two key citation scenarios: indirect queries (matching sentences to paper titles) and direct queries (author attribution), both enhanced with contextual metadata. We conduct extensive experiments with models such as GPT-O1, GPT-4O, GPT-3.5, DeepSeek, and other smaller models like Perplexity AI (7B). While top-tier LLMs achieve high performance in sentence attribution, they struggle with high hallucination rates, a key metric for scientific reliability. Our metadata-augmented approach reduces hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Scientific Computing and Data Management · scientometrics and bibliometrics research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · WordPiece · Position-Wise Feed-Forward Layer · Absolute Position Encodings · BART · Linear Warmup With Linear Decay