Citations as Queries: Source Attribution Using Language Models as Rerankers
Ryan Muther, David Smith

TL;DR
This paper investigates using fine-tuned language models as rerankers to improve source attribution in texts, demonstrating that semi-supervised methods can match fully supervised performance without extensive annotations.
Contribution
It introduces a novel approach of using language models as rerankers for source attribution and compares supervised and semi-supervised methods across different datasets.
Findings
Semi-supervised reranking approaches perform nearly as well as fully supervised ones.
Language models effectively improve source attribution accuracy.
The method works across diverse languages and text types.
Abstract
This paper explores new methods for locating the sources used to write a text, by fine-tuning a variety of language models to rerank candidate sources. After retrieving candidates sources using a baseline BM25 retrieval model, a variety of reranking methods are tested to see how effective they are at the task of source attribution. We conduct experiments on two datasets, English Wikipedia and medieval Arabic historical writing, and employ a variety of retrieval and generation based reranking models. In particular, we seek to understand how the degree of supervision required affects the performance of various reranking models. We find that semisupervised methods can be nearly as effective as fully supervised methods while avoiding potentially costly span-level annotation of the target and source documents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior
