Data-Centric Visual Development for Self-Driving Labs
Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu

TL;DR
This paper presents a hybrid data generation pipeline combining real and virtual images to train robust models for bubble detection in self-driving laboratories, significantly reducing data collection effort while maintaining high accuracy.
Contribution
It introduces a novel hybrid pipeline that fuses real human-in-the-loop data with reference-guided virtual data to address data scarcity in precision-sensitive tasks.
Findings
Model trained on real data achieves 99.6% accuracy.
Mixing real and virtual data maintains 99.4% accuracy.
The approach reduces data collection and review effort.
Abstract
Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Scientific Computing and Data Management · Advanced Neural Network Applications
