LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts
Krunal Jesani, Dmitry Ignatov, Radu Timofte

TL;DR
This paper introduces NN-Caption, an LLM-guided neural architecture search pipeline that automatically generates and evaluates image captioning models, demonstrating the potential of LLMs to automate model design under strict API constraints.
Contribution
The work presents a novel LLM-guided NAS pipeline for image captioning, including prompt templates, evaluation methods, and integration with open datasets, advancing automated model generation.
Findings
Over half of generated models trained successfully and produced meaningful captions.
Using more input snippets slightly decreases success rate.
The pipeline effectively integrates prompt-based code generation with automatic evaluation.
Abstract
Neural architecture search (NAS) traditionally requires significant human expertise or automated trial-and-error to design deep learning models. We present NN-Caption, an LLM-guided neural architecture search pipeline that generates runnable image-captioning models by composing CNN encoders from LEMUR's classification backbones with sequence decoders (LSTM/GRU/Transformer) under a strict Net API. Using DeepSeek-R1-0528-Qwen3-8B as the primary generator, we present the prompt template and examples of generated architectures. We evaluate on MS COCO with BLEU-4. The LLM generated dozens of captioning models, with over half successfully trained and producing meaningful captions. We analyse the outcomes of using different numbers of input model snippets (5 vs. 10) in the prompt, finding a slight drop in success rate when providing more candidate components. We also report training dynamics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Artificial Intelligence in Healthcare and Education
