TL;DR
Datatractor enhances FAIR data practices in chemical and materials sciences by creating a registry of data extractors, standardizing their descriptions, and providing a reference implementation to improve discoverability and usability.
Contribution
It introduces a standardized schema and a curated registry for data extractors, facilitating interoperability and ease of use in scientific data workflows.
Findings
Increased discoverability of data extractor tools
Standardized, machine-actionable descriptions of extractors
Reference implementation for data extraction workflows
Abstract
Two key issues hindering the transition towards FAIR data science are the poor discoverability and inconsistent instructions for the use of data extractor tools, i.e., how we go from raw data files created by instruments, to accessible metadata and scientific insight. If the existing format conversion tools are hard to find, install, and use, their reimplementation will lead to a duplication of effort, and an increase in the associated maintenance burden is inevitable. The Datatractor framework presented in this work addresses these issues. First, by providing a curated registry of such extractor tools their discoverability will increase. Second, by describing them using a standardised but lightweight schema, their installation and use is machine-actionable. Finally, we provide a reference implementation for such data extraction. The Datatractor framework can be used to provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
