Pipelines for Procedural Information Extraction from Scientific Literature: Towards Recipes using Machine Learning and Data Science
Huichen Yang, Carlos A. Aguirre, Maria F. De La Torre, Derek, Christensen, Luis Bobadilla, Emily Davich, Jordan Roth, Lei Luo, Yihong, Theis, Alice Lam, T. Yong-Jin Han, David Buttler, William H. Hsu

TL;DR
This paper presents a machine learning pipeline for extracting procedural recipes from scientific literature, enabling structured synthesis instructions with open-source tools and a focus on quality metrics.
Contribution
It introduces a novel pipeline for extracting procedural information from scientific texts, combining semi-supervised learning and structured data transformation.
Findings
Achieved high precision and recall in recipe step extraction
Developed open-source tools for document filtering and payload extraction
Demonstrated effective question answering based on extracted recipes
Abstract
This paper describes a machine learning and data science pipeline for structured information extraction from documents, implemented as a suite of open-source tools and extensions to existing tools. It centers around a methodology for extracting procedural information in the form of recipes, stepwise procedures for creating an artifact (in this case synthesizing a nanomaterial), from published scientific literature. From our overall goal of producing recipes from free text, we derive the technical objectives of a system consisting of pipeline stages: document acquisition and filtering, payload extraction, recipe step extraction as a relationship extraction task, recipe assembly, and presentation through an information retrieval interface with question answering (QA) functionality. This system meets computational information and knowledge management (CIKM) requirements of metadata-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
