Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Sneha Singhania, Simon Razniewski, Gerhard Weikum

TL;DR
This paper introduces L3X, a retrieval-augmented method for extracting long object lists from lengthy documents, significantly improving recall over traditional language models by combining generation and validation stages.
Contribution
The paper presents a novel two-stage approach that enhances recall in long list extraction from documents using retrieval-augmented language models, outperforming LLM-only methods.
Findings
L3X achieves higher recall than baseline models.
The retrieval augmentation improves long list extraction accuracy.
The two-stage process effectively balances recall and precision.
Abstract
Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
MethodsFocus
