Morescient GAI for Software Engineering (Extended Version)

Marcus Kessel; Colin Atkinson

arXiv:2406.04710·cs.SE·January 7, 2025

Morescient GAI for Software Engineering (Extended Version)

Marcus Kessel, Colin Atkinson

PDF

Open Access

TL;DR

This paper proposes a new class of Generative AI models, called 'Morescient', designed to understand both the semantics and static aspects of software, aiming to improve trustworthiness in software engineering tasks.

Contribution

It introduces the concept of 'Morescient' GAI models that incorporate semantic awareness and outlines a roadmap for their development and open dissemination.

Findings

01

Identifies limitations of current code models trained only on syntax.

02

Proposes a new class of models trained on both semantics and static facets.

03

Suggests a new platform for generating structured execution observations.

Abstract

The ability of Generative AI (GAI) technology to automatically check, synthesize and modify software engineering artifacts promises to revolutionize all aspects of software engineering. Using GAI for software engineering tasks is consequently one of the most rapidly expanding fields of software engineering research, with over a hundred LLM-based code models having been published since 2021. However, the overwhelming majority of existing code models share a major weakness - they are exclusively trained on the syntactic facet of software, significantly lowering their trustworthiness in tasks dependent on software semantics. To address this problem, a new class of "Morescient" GAI is needed that is "aware" of (i.e., trained on) both the semantic and static facets of software. This, in turn, will require a new generation of software observation platforms capable of generating large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems