Probing Biomedical Embeddings from Language Models

Qiao Jin; Bhuwan Dhingra; William W. Cohen; Xinghua Lu

arXiv:1904.02181·cs.CL·April 5, 2019·29 cites

Probing Biomedical Embeddings from Language Models

Qiao Jin, Bhuwan Dhingra, William W. Cohen, Xinghua Lu

PDF

Open Access 1 Repo

TL;DR

This study investigates the intrinsic information encoded in biomedical language model embeddings, revealing that certain models retain more entity and relation details even without fine-tuning, which impacts their utility in downstream tasks.

Contribution

The paper provides a comparative analysis of biomedical embeddings from different models, highlighting the unexpected strength of BioELMo as a fixed feature extractor in probing tasks.

Findings

01

BioELMo outperforms BioBERT in probing tasks without fine-tuning.

02

Better encoding of entity types and relations in BioELMo.

03

Fine-tuned BioBERT surpasses BioELMo in downstream biomedical tasks.

Abstract

Contextualized word embeddings derived from pre-trained language models (LMs) show significant improvements on downstream NLP tasks. Pre-training on domain-specific corpora, such as biomedical articles, further improves their performance. In this paper, we conduct probing experiments to determine what additional information is carried intrinsically by the in-domain trained contextualized embeddings. For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers. We compare BERT, ELMo, BioBERT and BioELMo, a biomedical version of ELMo trained on 10M PubMed abstracts. Surprisingly, while fine-tuned BioBERT is better than BioELMo in biomedical NER and NLI tasks, as a fixed feature extractor BioELMo outperforms BioBERT in our probing tasks. We use visualization and nearest neighbor analysis to show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Andy-jqa/bioelmo
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Weight Decay · Residual Connection · Adam · Layer Normalization · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?