DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows

Dimitri Yatsenko; Thinh T. Nguyen (DataJoint Inc.; Houston; USA)

arXiv:2602.16585·cs.DB·February 19, 2026

DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows

Dimitri Yatsenko, Thinh T. Nguyen (DataJoint Inc., Houston, USA)

PDF

Open Access

TL;DR

DataJoint 2.0 introduces a relational workflow model that unifies data structure, provenance, and computational dependencies, enabling reliable, agentic scientific workflows with transactional guarantees and extensibility.

Contribution

It presents a novel formal system for scientific workflows that integrates data, dependencies, and integrity constraints, extending with object storage, semantic matching, and distributed coordination.

Findings

01

Unified schema for data and dependencies

02

Enhanced data integrity and provenance tracking

03

Scalable, extensible workflow management

Abstract

Operational rigor determines whether human-agent collaboration succeeds or fails. Scientific data pipelines need the equivalent of DevOps -- SciOps -- yet common approaches fragment provenance across disconnected systems without transactional guarantees. DataJoint 2.0 addresses this gap through the relational workflow model: tables represent workflow steps, rows represent artifacts, foreign keys prescribe execution order. The schema specifies not only what data exists but how it is derived -- a single formal system where data structure, computational dependencies, and integrity constraints are all queryable, enforceable, and machine-readable. Four technical innovations extend this foundation: object-augmented schemas integrating relational metadata with scalable object storage, semantic matching using attribute lineage to prevent erroneous joins, an extensible type system for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Research Data Management Practices