Toward a Flexible Metadata Pipeline for Fish Specimen Images
Dom Jebbia, Xiaojun Wang, Yasin Bakis, Henry L. Bart Jr., and Jane, Greenberg

TL;DR
This paper presents a four-phase approach to developing a flexible metadata pipeline for a large collection of fish specimen images, supporting AI research and adhering to FAIR principles.
Contribution
It introduces a novel four-phased methodology and an RDF graph prototype for flexible metadata management in biological image collections.
Findings
Successful development of a flexible RDF-based metadata pipeline
Enhanced support for AI tasks like species identification and trait extraction
Framework aligns with FAIR data principles
Abstract
Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Research Data Management Practices · Water Quality Monitoring Technologies
