Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts
Ruth Dannenfelser, Jeffrey Zhong, Ran Zhang, Vicky Yao

TL;DR
This paper introduces FlaMBé, a comprehensive dataset for extracting procedural knowledge from biomedical texts, especially in single cell research, to improve NLP models and enhance reproducibility in biomedical workflows.
Contribution
The paper presents FlaMBé, the largest expert-curated dataset for procedural knowledge and entity recognition in biomedical texts, addressing a critical gap in structured datasets for this task.
Findings
FlaMBé enables better extraction of workflows from biomedical literature.
The dataset improves named entity recognition and disambiguation for tissue and cell types.
Automating workflow mining can enhance reproducibility in biomedical research.
Abstract
Many of the most commonly explored natural language processing (NLP) information extraction tasks can be thought of as evaluations of declarative knowledge, or fact-based information extraction. Procedural knowledge extraction, i.e., breaking down a described process into a series of steps, has received much less attention, perhaps in part due to the lack of structured datasets that capture the knowledge extraction process from end-to-end. To address this unmet need, we present FlaMB\'e (Flow annotations for Multiverse Biological entities), a collection of expert-curated datasets across a series of complementary tasks that capture procedural knowledge in biomedical texts. This dataset is inspired by the observation that one ubiquitous source of procedural knowledge that is described as unstructured text is within academic papers describing their methodology. The workflows annotated in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies
