Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders
Brendan Chambers, James Evans

TL;DR
This paper explores the use of deep transformer models like BERT for mapping scientific literature, demonstrating improved document encoding and community analysis through contextual embeddings and optimized pooling strategies.
Contribution
It introduces a procedure for encoding scientific texts with transformer models, highlighting the importance of domain-matched training data and pooling strategies for better scientific mapping.
Findings
Contextual embeddings outperform static word embeddings in retrieval tasks.
Pooling strategy significantly affects the discriminability of representations.
Domain-matched training data enhances model performance.
Abstract
The growing deluge of scientific publications demands text analysis tools that can help scientists and policy-makers navigate, forecast and beneficially guide scientific research. Recent advances in natural language understanding driven by deep transformer networks offer new possibilities for mapping science. Because the same surface text can take on multiple and sometimes contradictory specialized senses across distinct research communities, sensitivity to context is critical for infometric applications. Transformer embedding models such as BERT capture shades of association and connotation that vary across the different linguistic contexts of any particular word or span of text. Here we report a procedure for encoding scientific documents with these tools, measuring their improvement over static word embeddings in a nearest-neighbor retrieval task. We find discriminability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Linear Warmup With Linear Decay · Weight Decay · Adam
