PARM: A Paragraph Aggregation Retrieval Model for Dense   Document-to-Document Retrieval

Sophia Althammer; Sebastian Hofst\"atter; Mete Sertkan; Suzan; Verberne; Allan Hanbury

arXiv:2201.01614·cs.IR·August 16, 2022

PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Sophia Althammer, Sebastian Hofst\"atter, Mete Sertkan, Suzan, Verberne, Allan Hanbury

PDF

Open Access 1 Repo

TL;DR

This paper introduces PARM, a paragraph aggregation model for dense document-to-document retrieval, improving retrieval effectiveness in legal case datasets with limited training data by aggregating paragraph-level results.

Contribution

The paper proposes PARM, a novel paragraph-level retrieval approach with VRRF aggregation, enabling effective dense document retrieval with limited labeled data.

Findings

01

VRRF outperforms traditional rank-based aggregation methods.

02

PARM achieves higher retrieval effectiveness than document-level retrieval.

03

Effective training strategies for limited data scenarios are demonstrated.

Abstract

Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retrieval. In order to use DPR models for document-to-document retrieval, we propose a Paragraph Aggregation Retrieval Model (PARM) which liberates DPR models from their limited input length. PARM retrieves documents on the paragraph-level: for each query paragraph, relevant documents are retrieved based on their paragraphs. Then the relevant results per query paragraph are aggregated into one ranked list for the whole query document. For the aggregation we propose vector-based aggregation with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sophiaalthammer/parm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Text and Document Classification Technologies