Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
Sheshera Mysore, Arman Cohan, Tom Hope

TL;DR
This paper introduces a novel multi-vector model for fine-grained scientific document similarity, leveraging co-citations as textual supervision to improve matching of specific aspects across papers.
Contribution
It proposes a new approach using multi-vector representations and co-citation-based supervision, with two aspect matching methods including an efficient single-match and a detailed sparse matching via Optimal Transport.
Findings
Improves similarity task performance across four datasets.
Fast single-match method achieves competitive results.
Enables fine-grained similarity analysis in large scientific corpora.
Abstract
We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple papers together (co-citations). Such co-citations not only reflect close paper relatedness, but also provide textual descriptions of how the co-cited papers are related. This novel form of textual supervision is used for learning to match aspects across papers. We develop multi-vector representations where vectors correspond to sentence-level aspects of documents, and present two methods for aspect matching: (1) A fast method that only matches single aspects, and (2) a method that makes sparse multiple matches with an Optimal Transport mechanism that computes an Earth Mover's Distance between aspects. Our approach improves performance on document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗allenai/aspire-sentence-embeddermodel· 250 dl· ♡ 3250 dl♡ 3
- 🤗allenai/aspire-biencoder-biomed-scibmodel· 17 dl17 dl
- 🤗allenai/aspire-biencoder-biomed-specmodel· 3 dl3 dl
- 🤗allenai/aspire-biencoder-compsci-specmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗allenai/aspire-contextualsentence-multim-biomedmodel· 3 dl3 dl
- 🤗allenai/aspire-contextualsentence-multim-compscimodel· 7 dl· ♡ 17 dl♡ 1
- 🤗allenai/aspire-contextualsentence-singlem-biomedmodel· 3 dl3 dl
- 🤗allenai/aspire-contextualsentence-singlem-compscimodel· 4 dl· ♡ 14 dl♡ 1
- 🤗idopinto/ot-aspire-biomed-reconmodel· 2 dl2 dl
- 🤗idopinto/co-specter-biomed-reconmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
