CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing
Ming Du, Xiangyu Yin, Yanqi Luo, Dishant Beniwal, Songyuan Tang, Hemant Sharma, Mathew J. Cherukara

TL;DR
CVEvolve is an autonomous, zero-code system that leverages LLMs to discover and improve algorithms for processing complex scientific images, aiding domain scientists without extensive technical expertise.
Contribution
It introduces CVEvolve, a novel autonomous agentic framework combining search strategies and LLMs for scientific data algorithm discovery without coding.
Findings
CVEvolve outperforms baseline methods in image registration, peak detection, and segmentation tasks.
Holdout testing helps identify algorithms with better generalization.
The system enables domain scientists to develop practical algorithms from unstructured data.
Abstract
Scientific data processing often requires task-specific algorithms or AI models, creating a barrier for domain scientists who need to analyze their data but may not have extensive computing or image-processing expertise. This barrier is especially pronounced when data are noisy, have a high dynamic range, are sparsely labeled, or are only loosely specified. We introduce CVEvolve, an autonomous agentic harness with a zero-code interface for scientific data-processing algorithm discovery. CVEvolve combines a multi-round search strategy with tools for code execution, evaluation implementation, history management, holdout testing, and optional inspection of scientific data and visual outputs. The search alternates between discovery and improvement actions, and uses lineage-aware stochastic candidate sampling to balance exploration and exploitation. We demonstrate CVEvolve on x-ray…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
