Supervised Contrastive Learning for Interpretable Long-Form Document Matching
Akshita Jha, Vineeth Rakesh, Jaideep Chandrashekar, Adithya Samavedhi,, and Chandan K. Reddy

TL;DR
This paper introduces CoLDE, a transformer-based framework utilizing supervised contrastive learning to improve interpretability and accuracy in long-form document matching across various domains.
Contribution
The paper proposes a novel contrastive learning approach with specialized positional embeddings and attention mechanisms for interpretable long document comparison.
Findings
Outperforms state-of-the-art on multiple long document datasets
Robust to document length variations and text perturbations
Provides fine-grained, interpretable similarity scores
Abstract
Recent advancements in deep learning techniques have transformed the area of semantic text matching. However, most state-of-the-art models are designed to operate with short documents such as tweets, user reviews, comments, etc. These models have fundamental limitations when applied to long-form documents such as scientific papers, legal documents, and patents. When handling such long documents, there are three primary challenges: (i) the presence of different contexts for the same word throughout the document, (ii) small sections of contextually similar text between two documents, but dissimilar text in the remaining parts (this defies the basic understanding of "similarity"), and (iii) the coarse nature of a single global similarity measure which fails to capture the heterogeneity of the document content. In this paper, we describe CoLDE: Contrastive Long Document Encoder - a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsContrastive Learning
