N-Version Assessment and Enhancement of Generative AI

Marcus Kessel; Colin Atkinson

arXiv:2409.14071·cs.SE·October 22, 2024

N-Version Assessment and Enhancement of Generative AI

Marcus Kessel, Colin Atkinson

PDF

TL;DR

This paper introduces a differential GAI approach that generates multiple code versions for comparative analysis, enhancing the reliability of GAI outputs and proposing a platform for large-scale evaluation.

Contribution

It presents the D-GAI method and the LASSO platform to improve verification of GAI-generated code through version diversity and large-scale analysis.

Findings

01

D-GAI improves reliability of GAI outputs.

02

LASSO enables large-scale evaluation of code versions.

03

Differential analysis enhances trust in GAI-generated artifacts.

Abstract

Generative AI (GAI) holds great potential to improve software engineering productivity, but its untrustworthy outputs, particularly in code synthesis, pose significant challenges. The need for extensive verification and validation (V&V) of GAI-generated artifacts may undermine the potential productivity gains. This paper proposes a way of mitigating these risks by exploiting GAI's ability to generate multiple versions of code and tests to facilitate comparative analysis across versions. Rather than relying on the quality of a single test or code module, this "differential GAI" (D-GAI) approach promotes more reliable quality evaluation through version diversity. We introduce the Large-Scale Software Observatorium (LASSO), a platform that supports D-GAI by executing and analyzing large sets of code versions and tests. We discuss how LASSO enables rigorous evaluation of GAI-generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.