Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Halley Young; Nikolaj Bj\"orner

arXiv:2604.27209·cs.SE·May 4, 2026

Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Halley Young, Nikolaj Bj\"orner

PDF

TL;DR

This paper introduces Comet-H, an iterative system that orchestrates language model-driven research software development, addressing issues like hallucination and desynchronization to improve consistency and reliability.

Contribution

It presents a novel prompt automation framework with a transparent scoring mechanism, enabling coordinated ideation, implementation, and evaluation in research software projects.

Findings

01

Achieved an F1 score of 0.768 on a static analysis benchmark.

02

Developed 46 research-software repositories across multiple domains.

03

Found audit-and-contraction passes dominate successful development trajectories.

Abstract

Large language models can now generate substantial code and draft research text, but research-software projects require more than either artifact alone. The mathematical thesis, executable system, benchmark surface, and public claims must mature together, yet often drift apart. We identify two LM-specific failure modes: hallucination accumulation, in which claims exceed what code or theory supports and unsupported assertions propagate across sessions; and desynchronization, in which code, theory, or the model's own world model fall out of alignment. We propose Comet-H, an iterative prompt automaton that orchestrates ideation, implementation, evaluation, grounding, and paper-writing as coupled coordinates of a single workspace state. At each step, a controller selects the next prompt by scoring it against what the workspace currently lacks, carries unfinished follow-up work forward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.