The Power of Selecting Key Blocks with Local Pre-ranking for Long   Document Information Retrieval

Minghan Li; Diana Nicoleta Popa; Johan Chagnon; Yagmur Gizem Cinar,; Eric Gaussier

arXiv:2111.09852·cs.IR·October 18, 2022

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar,, Eric Gaussier

PDF

1 Repo

TL;DR

This paper introduces a novel method for long document retrieval that involves selecting key blocks through local pre-ranking, enabling effective processing by models like BERT despite the challenges posed by long documents.

Contribution

The paper proposes a new approach of local pre-ranking to select key blocks, improving long document retrieval efficiency and effectiveness over existing truncation, segmentation, and sparse attention methods.

Findings

01

The method outperforms traditional truncation and segmentation techniques.

02

It achieves comparable or better results with less computational cost.

03

Experimental results validate the effectiveness of key block selection in IR tasks.

Abstract

On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, in which case one loses potential relevant information, segmenting them into several passages, which may lead to miss some information and high computational complexity when the number of passages is large, or modifying the self-attention mechanism to make it sparser as in sparse-attention models, at the risk again of missing some information. We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lmh0921/keyb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Softmax · Weight Decay · Attention Dropout · Layer Normalization