Data-Centric Visual Development for Self-Driving Labs

Anbang Liu; Guanzhong Hu; Jiayi Wang; Ping Guo; Han Liu

arXiv:2512.02018·cs.CV·December 2, 2025

Data-Centric Visual Development for Self-Driving Labs

Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu

PDF

Open Access

TL;DR

This paper presents a hybrid data generation pipeline combining real and virtual images to train robust models for bubble detection in self-driving laboratories, significantly reducing data collection effort while maintaining high accuracy.

Contribution

It introduces a novel hybrid pipeline that fuses real human-in-the-loop data with reference-guided virtual data to address data scarcity in precision-sensitive tasks.

Findings

01

Model trained on real data achieves 99.6% accuracy.

02

Mixing real and virtual data maintains 99.4% accuracy.

03

The approach reduces data collection and review effort.

Abstract

Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Scientific Computing and Data Management · Advanced Neural Network Applications