Exploring and Characterizing Large Language Models For Embedded System   Development and Debugging

Zachary Englhardt; Richard Li; Dilini Nissanka; Zhihan Zhang; Girish; Narayanswamy; Joseph Breda; Xin Liu; Shwetak Patel; Vikram Iyer

arXiv:2307.03817·cs.SE·November 23, 2023·1 cites

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish, Narayanswamy, Joseph Breda, Xin Liu, Shwetak Patel, Vikram Iyer

PDF

Open Access

TL;DR

This paper systematically evaluates large language models like GPT-3.5, GPT-4, and PaLM 2 for embedded system development, revealing GPT-4's strong cross-domain reasoning and proposing a human-AI workflow that enhances productivity and success in embedded programming tasks.

Contribution

The study introduces an open source hardware-in-the-loop framework to assess LLMs for embedded systems and develops a human-AI workflow that significantly improves development success rates.

Findings

01

GPT-4 generates fully correct embedded code from a single prompt.

02

GPT-4 produces functional I2C interfaces 66% of the time.

03

The human-AI workflow increases success rate for building an environmental sensor from 25% to 100%.

Abstract

Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this paper we develop an extensible, open source hardware-in-the-loop framework to systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) to assess their capabilities and limitations for embedded system development. We observe through our study that even when these tools fail to produce working code, they consistently generate helpful reasoning about embedded design tasks. We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems. Our evaluation platform for verifying LLM generated programs uses sensor actuator pairs for physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGreen IT and Sustainability · Software Engineering Research · Age of Information Optimization

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Absolute Position Encodings · Adam · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing