Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model
Na Pang, Li Qian, Weimin Lyu, Jin-Dong Yang

TL;DR
This paper introduces a BERT-CRF based model for extracting chemical entities and relations from scientific publications, utilizing a new annotated corpus in the chemical bond domain to improve information extraction in computational chemistry.
Contribution
It presents a novel joint BERT-CRF model and a new chemical corpus for improved entity and relation extraction in chemistry literature.
Findings
Achieved state-of-the-art NER performance on the chemical corpus.
Developed a new annotated corpus for chemical entity and relation extraction.
Demonstrated the effectiveness of joint extraction model.
Abstract
Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more fine-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in the biomedicine or life science field instead of the chemistry field, we build a new corpus in chemical bond field annotated for 7 types of entities: compound, solvent, method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF model to build scientific chemical data chains by extracting 7 chemical entities and relations from publications. And we propose a joint model to extract the entities and relations simultaneously. Experimental results on our Chemical Special Corpus demonstrate that we achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
