Specialized Document Embeddings for Aspect-based Similarity of Research Papers
Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm

TL;DR
This paper introduces a scalable method for aspect-based research paper similarity using specialized embeddings, improving recommendation accuracy and addressing biases inherent in generic embeddings.
Contribution
The paper proposes a novel approach to generate aspect-specific document embeddings without segmentation, enhancing scalability and interpretability in research paper recommendations.
Findings
Siamese SciBERT achieved the highest similarity scores.
Aspect-based embeddings reduce implicit dataset and method biases.
The approach scales linearly with corpus size.
Abstract
Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one perspective on document similarity that ignores which aspects make two documents alike. To address this limitation, aspect-based similarity measures have been developed using document segmentation or pairwise multi-class document classification. While segmentation harms the document coherence, the pairwise classification approach scales poorly to large scale corpora. In this paper, we treat aspect-based similarity as a classical vector similarity problem in aspect-specific embedding spaces. We represent a document not as a single generic embedding but as multiple specialized embeddings. Our approach avoids document segmentation and scales linearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
