EmbedAgent: Benchmarking Large Language Models in Embedded System Development
Ruiyang Xu, Jialun Cao, Mingyuan Wu, Wenliang Zhong, Yaojie Lu, Ben He, Xianpei Han, Shing-Chi Cheung, Le Sun

TL;DR
This paper introduces EmbedAgent and Embedbench to evaluate large language models in embedded system development, revealing current limitations and proposing strategies to enhance their performance in real-world tasks.
Contribution
The paper presents EmbedAgent and Embedbench, pioneering benchmarks and paradigms for assessing LLMs in embedded system tasks, and proposes retrieval and feedback strategies to improve their capabilities.
Findings
DeepSeek-R1 achieves 55.6% pass@1 with schematic info
MicroPython on Raspberry Pi Pico reaches 73.8% pass@1
Strategies improve Deepseek-R1 to 65.1% pass@1 and migration accuracy to 27.8%
Abstract
Large Language Models (LLMs) have shown promise in various tasks, yet few benchmarks assess their capabilities in embedded system development. In this paper, we introduce EmbedAgent, a paradigm designed to simulate real-world roles in embedded system development, such as Embedded System Programmer, Architect, and Integrator. This paradigm enables LLMs to be tested in tasks that bridge the gap between digital and physical systems, allowing for a more comprehensive assessment of their capabilities. To evaluate LLMs on these tasks, we propose Embedbench, the first comprehensive benchmark for embedded system programming, circuit design, and cross-platform migration. Embedbench consists of 126 cases, covering 9 electronic components across 3 hardware platforms. Through extensive experiments on 10 mainstream LLMs, we uncover several key findings. Surprisingly, despite the simplicity of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques
