Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System
Necva B\"ol\"uc\"u, Jessica Irons, Changhyun Lee, Brian Jin, Maciej Rybinski, Huichen Yang, Andreas Duenser, Stephen Wan

TL;DR
SCILIRE is a Human-AI teaming system that streamlines the creation and curation of scientific datasets from literature, enhancing accuracy and efficiency through iterative review and feedback.
Contribution
The paper introduces SCILIRE, a novel system leveraging Human-AI collaboration principles for scientific data extraction and curation from literature.
Findings
Improves extraction fidelity in dataset creation.
Facilitates efficient and iterative data curation workflows.
Demonstrates effectiveness across multiple scientific domains.
Abstract
The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Biomedical Text Mining and Ontologies
