EnzChemRED, a rich enzyme chemistry relation extraction dataset
Po-Ting Lai, Elisabeth Coudert, Lucila Aimo, Kristian Axelsen, Lionel, Breuza, Edouard de Castro, Marc Feuermann, Anne Morgat, Lucille Pourcel, Ivo, Pedruzzi, Sylvain Poux, Nicole Redaschi, Catherine Rivoire, Anastasia, Sveshnikova, Chih-Hsuan Wei, Robert Leaman, Ling Luo

TL;DR
EnzChemRED is a new annotated dataset of enzyme-related literature that improves NLP models' ability to extract enzyme functions and reactions, aiding biological knowledge curation.
Contribution
The paper introduces EnzChemRED, a curated dataset for enzyme relation extraction, and demonstrates its effectiveness in enhancing NLP models for enzyme curation tasks.
Findings
Fine-tuning models with EnzChemRED improves NER and RE performance.
Achieved average F1 scores above 83% for enzyme and chemical relation extraction.
Developed an end-to-end pipeline for enzyme knowledge extraction from PubMed abstracts.
Abstract
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsOntology
