PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval
Sophia Althammer, Sebastian Hofst\"atter, Mete Sertkan, Suzan, Verberne, Allan Hanbury

TL;DR
This paper introduces PARM, a paragraph aggregation model for dense document-to-document retrieval, improving retrieval effectiveness in legal case datasets with limited training data by aggregating paragraph-level results.
Contribution
The paper proposes PARM, a novel paragraph-level retrieval approach with VRRF aggregation, enabling effective dense document retrieval with limited labeled data.
Findings
VRRF outperforms traditional rank-based aggregation methods.
PARM achieves higher retrieval effectiveness than document-level retrieval.
Effective training strategies for limited data scenarios are demonstrated.
Abstract
Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retrieval. In order to use DPR models for document-to-document retrieval, we propose a Paragraph Aggregation Retrieval Model (PARM) which liberates DPR models from their limited input length. PARM retrieves documents on the paragraph-level: for each query paragraph, relevant documents are retrieved based on their paragraphs. Then the relevant results per query paragraph are aggregated into one ranked list for the whole query document. For the aggregation we propose vector-based aggregation with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Text and Document Classification Technologies
