From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
Guang Yang, Xing Hu, Xiang Chen, Xin Xia

TL;DR
This paper investigates the reliability of multimodal language models in translating circuit diagrams into RTL code, revealing a covert bypass phenomenon called Mirage and proposing VeriGround to improve trustworthiness.
Contribution
The paper identifies Mirage as a hidden defect in vision-to-code models, introduces C2VEVAL for evaluation, and presents VeriGround, a training method that enhances visual grounding and robustness.
Findings
Mirage causes models to bypass visual input using identifier semantics.
VeriGround significantly reduces false refusals and improves accuracy under anonymized conditions.
VeriGround performs comparably to GPT-5.4 with only 4B parameters.
Abstract
Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be viewed as a visual domain-specific language for hardware: it encodes timing, topology, and bit level semantics that are invisible to casual inspection yet safety critical once fabricated in silicon. Translating such diagrams into register-transfer-level(RTL) code therefore represents an extreme reliability test for vision-to-code generation. We reveal a phenomenon we call Mirage: replacing a circuit diagram with a blank image leaves Pass@k unchanged or even higher, because models bypass the visual input and instead exploit identifier semantics in the module header to retrieve canonical RTL templates. This constitutes a new, highly covert class of defect in AI-assisted code generation that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
