CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Ming Du; Xiangyu Yin; Yanqi Luo; Dishant Beniwal; Songyuan Tang; Hemant Sharma; Mathew J. Cherukara

arXiv:2605.11359·cs.AI·May 13, 2026

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Ming Du, Xiangyu Yin, Yanqi Luo, Dishant Beniwal, Songyuan Tang, Hemant Sharma, Mathew J. Cherukara

PDF

TL;DR

CVEvolve is an autonomous, zero-code system that leverages LLMs to discover and improve algorithms for processing complex scientific images, aiding domain scientists without extensive technical expertise.

Contribution

It introduces CVEvolve, a novel autonomous agentic framework combining search strategies and LLMs for scientific data algorithm discovery without coding.

Findings

01

CVEvolve outperforms baseline methods in image registration, peak detection, and segmentation tasks.

02

Holdout testing helps identify algorithms with better generalization.

03

The system enables domain scientists to develop practical algorithms from unstructured data.

Abstract

Scientific data processing often requires task-specific algorithms or AI models, creating a barrier for domain scientists who need to analyze their data but may not have extensive computing or image-processing expertise. This barrier is especially pronounced when data are noisy, have a high dynamic range, are sparsely labeled, or are only loosely specified. We introduce CVEvolve, an autonomous agentic harness with a zero-code interface for scientific data-processing algorithm discovery. CVEvolve combines a multi-round search strategy with tools for code execution, evaluation implementation, history management, holdout testing, and optional inspection of scientific data and visual outputs. The search alternates between discovery and improvement actions, and uses lineage-aware stochastic candidate sampling to balance exploration and exploitation. We demonstrate CVEvolve on x-ray…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.