Encoding Multi-Domain Scientific Papers by Ensembling Multiple CLS Tokens
Ronald Seoh, Haw-Shiuan Chang, Andrew McCallum

TL;DR
This paper introduces Multi2SPE, a method that uses multiple CLS tokens in Transformers to better capture multi-domain scientific document features, improving tasks like citation prediction.
Contribution
It proposes Multi2SPE, a novel approach that employs multiple CLS tokens for enhanced multi-domain scientific document encoding, along with a new benchmark dataset.
Findings
Multi2SPE reduces citation prediction error by up to 25%.
It requires minimal additional computation over standard BERT.
The approach improves multi-domain document representations.
Abstract
Many useful tasks on scientific documents, such as topic classification and citation prediction, involve corpora that span multiple scientific domains. Typically, such tasks are accomplished by representing the text with a vector embedding obtained from a Transformer's single CLS token. In this paper, we argue that using multiple CLS tokens could make a Transformer better specialize to multiple scientific domains. We present Multi2SPE: it encourages each of multiple CLS tokens to learn diverse ways of aggregating token embeddings, then sums them up together to create a single vector representation. We also propose our new multi-domain benchmark, Multi-SciDocs, to test scientific paper vector encoders under multi-domain settings. We show that Multi2SPE reduces error by up to 25 percent in multi-domain citation prediction, while requiring only a negligible amount of computation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Adam · Weight Decay · Byte Pair Encoding · Linear Warmup With Linear Decay
