Recall Them All: Retrieval-Augmented Language Models for Long Object   List Extraction from Long Documents

Sneha Singhania; Simon Razniewski; Gerhard Weikum

arXiv:2405.02732·cs.CL·March 20, 2025

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

Sneha Singhania, Simon Razniewski, Gerhard Weikum

PDF

Open Access

TL;DR

This paper introduces L3X, a retrieval-augmented method for extracting long object lists from lengthy documents, significantly improving recall over traditional language models by combining generation and validation stages.

Contribution

The paper presents a novel two-stage approach that enhances recall in long list extraction from documents using retrieval-augmented language models, outperforming LLM-only methods.

Findings

01

L3X achieves higher recall than baseline models.

02

The retrieval augmentation improves long list extraction accuracy.

03

The two-stage process effectively balances recall and precision.

Abstract

Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis

MethodsFocus