Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

Santosh Premi Adhikari; Radu Timofte; Dmitry Ignatov

arXiv:2605.04903·cs.LG·May 7, 2026

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov

PDF

TL;DR

This paper introduces Delta-Code Generation, a delta-based approach for neural architecture search that refines models via code diffs, significantly reducing code length and computational cost compared to full model synthesis.

Contribution

It presents a novel delta-based fine-tuning pipeline for LLMs to generate compact architecture refinements, improving efficiency and diversity in neural architecture search.

Findings

01

Delta-based generation surpasses full-generation baseline in validity and accuracy.

02

Reduces output length by 75-85%, saving computational resources.

03

Achieves high first-epoch accuracy, demonstrating effective architecture refinement.

Abstract

Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs -- DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B -- across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy):…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.