OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI
Ipsita Bhar, Huseyin Tuna Erdinc, Thales Souza, Charles Jones, Felix J. Herrmann

TL;DR
OpenSeisML provides a large-scale, real seismic dataset from publicly available sources to facilitate generative AI applications in seismic inversion, addressing data scarcity issues.
Contribution
The paper introduces OpenSeisML, a curated dataset with an automated pipeline for seismic data preparation to support generative AI workflows.
Findings
Curated seismic datasets from UK NDR for AI research.
Automated pipeline ensures reproducibility of seismic data preparation.
Supports training generative models for uncertainty quantification.
Abstract
The advent of machine learning (ML) and computer vision has significantly accelerated seismic inversion workflows by reducing the computational cost of traditionally expensive iterative methods. However, the development and evaluation of ML methods remain limited by the scarcity of realistic velocity models, as most high-quality data are privately owned by oil and gas companies. To address this gap, we present OpenSeisML, a collection of real seismic datasets designed to support generative AI (Gen-AI) workflows for seismic inversion. The datasets are curated from publicly available surveys in the UK National Data Repository (NDR). When seismic volumes are in the time domain and wells are in depth, a time-to-depth conversion is required. We use checkshot data to establish the time-depth relationship and construct a velocity model through interpolation for accurate conversion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
