AI-assisted coding: Experiments with GPT-4
Russell A Poldrack, Thomas Lu, and Ga\v{s}per Begu\v{s}

TL;DR
This paper evaluates GPT-4's capabilities in AI-assisted coding, highlighting its strengths in code generation and refactoring, while emphasizing the necessity of human validation for accuracy and reliability.
Contribution
It provides empirical insights into GPT-4's effectiveness in code generation, refactoring, and testing, demonstrating both its potential and current limitations.
Findings
GPT-4 can generate code with substantial coverage but often requires human validation.
Refactoring with GPT-4 can significantly improve code quality metrics.
Many generated tests fail when applied to the code, indicating limitations in test reliability.
Abstract
Artificial intelligence (AI) tools based on large language models have acheived human-level performance on some computer programming tasks. We report several experiments using GPT-4 to generate computer code. These experiments demonstrate that AI code generation using the current generation of tools, while powerful, requires substantial human validation to ensure accurate performance. We also demonstrate that GPT-4 refactoring of existing code can significantly improve that code along several established metrics for code quality, and we show that GPT-4 can generate tests with substantial coverage, but that many of the tests fail when applied to the associated code. These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Machine Learning and Algorithms
MethodsAttention Is All You Need · fail · Linear Layer · Adam · Layer Normalization · Dense Connections · Absolute Position Encodings · Label Smoothing · Dropout · Multi-Head Attention
