Knowledge engineering for open science: Building and deploying knowledge bases for metadata standards

Mark A. Musen; Martin J. O'Connor; Josef Hardi; and Marcos Martinez-Romero

arXiv:2507.22391·cs.DL·July 31, 2025

Knowledge engineering for open science: Building and deploying knowledge bases for metadata standards

Mark A. Musen, Martin J. O'Connor, Josef Hardi, and Marcos Martinez-Romero

PDF

TL;DR

This paper discusses the development of knowledge bases and templates that enable scientists to create standardized, rich metadata for datasets, facilitating FAIR principles and promoting open science through intelligent systems.

Contribution

It introduces CEDAR, a system for encoding and deploying discipline-specific metadata standards as templates to improve data annotation and interoperability.

Findings

01

CEDAR templates standardize metadata across scientific communities.

02

Templates enable data annotation via web forms and spreadsheets.

03

The system supports adherence to metadata standards in open science.

Abstract

Scientists strive to make their datasets available in open repositories, with the goal that they be findable, accessible, interoperable, and reusable (FAIR). Although it is hard for most investigators to remember all the guiding principles associated with FAIR data, there is one overarching requirement: The data need to be annotated with rich, discipline-specific, standardized metadata. The Center for Expanded Data Annotation and Retrieval (CEDAR) builds technology that enables scientists to encode metadata standards as templates that enumerate the attributes of different kinds of experiments. These metadata templates capture preferences regarding how data should be described and what a third party needs to know to make sense of the datasets. CEDAR templates describing community metadata preferences have been used to standardize metadata for a variety of scientific consortia. They have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.