Subtopic-aware View Sampling and Temporal Aggregation for Long-form   Document Matching

Youchao Zhou; Heyan Huang; Zhijing Wu; Yuhang Liu; Xinglin Wang

arXiv:2412.07573·cs.IR·December 25, 2024

Subtopic-aware View Sampling and Temporal Aggregation for Long-form Document Matching

Youchao Zhou, Heyan Huang, Zhijing Wu, Yuhang Liu, Xinglin Wang

PDF

Open Access

TL;DR

This paper introduces a novel subtopic-aware framework for long-form document matching that captures diverse matching signals through multiple views and employs temporal aggregation to effectively integrate heterogeneous information, improving performance on tasks like news duplication and legal retrieval.

Contribution

It proposes a new subtopic-aware view sampling and temporal aggregation method to better model heterogeneous matching signals in long documents.

Findings

01

Effective on news duplication detection

02

Improves legal case retrieval accuracy

03

Outperforms existing hierarchical models

Abstract

Long-form document matching aims to judge the relevance between two documents and has been applied to various scenarios. Most existing works utilize hierarchical or long context models to process documents, which achieve coarse understanding but may ignore details. Some researchers construct a document view with similar sentences about aligned document subtopics to focus on detailed matching signals. However, a long document generally contains multiple subtopics. The matching signals are heterogeneous from multiple topics. Considering only the homologous aligned subtopics may not be representative enough and may cause biased modeling. In this paper, we introduce a new framework to model representative matching signals. First, we propose to capture various matching signals through subtopics of document pairs. Next, We construct multiple document views based on subtopics to cover…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services

MethodsFocus