Structure and Semantics Preserving Document Representations
Natraj Raman, Sameena Shah, Manuela Veloso

TL;DR
This paper introduces a deep metric learning approach that integrates document content and inter-document relationships to improve retrieval, using a novel quintuplet loss and flexible margins for better semantic and structural preservation.
Contribution
It presents a holistic, fine-tunable model that combines intra-document content with inter-document relations using a novel loss function for enhanced retrieval performance.
Findings
Outperforms existing methods on multiple document retrieval datasets.
Effectively encodes both semantic relevance and structural relationships.
Supports query projection during inference for practical application.
Abstract
Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text. The inclusion of structural relationship between documents can benefit the retrieval mechanism by addressing semantic gaps. However, incorporating these relationships requires tractable mechanisms that balance structure with semantics and take advantage of the prevalent pre-train/fine-tune paradigm. We propose here a holistic approach to learning document representations by integrating intra-document content with inter-document relations. Our deep metric learning solution analyzes the complex neighborhood structure in the relationship network to efficiently sample similar/dissimilar document pairs and defines a novel quintuplet loss function that simultaneously encourages document pairs that are semantically relevant to be closer and structurally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
