Specialized Document Embeddings for Aspect-based Similarity of Research   Papers

Malte Ostendorff; Till Blume; Terry Ruas; Bela Gipp; Georg Rehm

arXiv:2203.14541·cs.IR·March 29, 2022

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable method for aspect-based research paper similarity using specialized embeddings, improving recommendation accuracy and addressing biases inherent in generic embeddings.

Contribution

The paper proposes a novel approach to generate aspect-specific document embeddings without segmentation, enhancing scalability and interpretability in research paper recommendations.

Findings

01

Siamese SciBERT achieved the highest similarity scores.

02

Aspect-based embeddings reduce implicit dataset and method biases.

03

The approach scales linearly with corpus size.

Abstract

Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one perspective on document similarity that ignores which aspects make two documents alike. To address this limitation, aspect-based similarity measures have been developed using document segmentation or pairwise multi-class document classification. While segmentation harms the document coherence, the pairwise classification approach scales poorly to large scale corpora. In this paper, we treat aspect-based similarity as a classical vector similarity problem in aspect-specific embedding spaces. We represent a document not as a single generic embedding but as multiple specialized embeddings. Our approach avoids document segmentation and scales linearly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

malteos/aspect-document-embeddings
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining