An Empirical Evaluation of Competitive Programming AI: A Case Study of   AlphaCode

Sila Lertbanjongngam; Bodin Chinthanet; Takashi Ishio; Raula Gaikovina; Kula; Pattara Leelaprute; Bundit Manaskasemsak; Arnon Rungsawang; Kenichi; Matsumoto

arXiv:2208.08603·cs.SE·August 29, 2022

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

Sila Lertbanjongngam, Bodin Chinthanet, Takashi Ishio, Raula Gaikovina, Kula, Pattara Leelaprute, Bundit Manaskasemsak, Arnon Rungsawang, Kenichi, Matsumoto

PDF

Open Access

TL;DR

This study empirically evaluates AlphaCode, a code generation AI for competitive programming, analyzing code similarity and performance compared to human solutions, revealing comparable code similarity but mixed performance results.

Contribution

It provides the first empirical analysis of AlphaCode's generated code similarity and performance, highlighting strengths and limitations in practical competitive programming scenarios.

Findings

01

AlphaCode-generated codes are similar to human codes with an average similarity score of 0.56.

02

Generated code performs similarly or worse than human code in execution time and memory usage.

03

AlphaCode tends to generate more similar solutions for low-difficulty problems and less efficient code for high-difficulty problems.

Abstract

AlphaCode is a code generation system for assisting software developers in solving competitive programming problems using natural language problem descriptions. Despite the advantages of the code generating system, the open source community expressed concerns about practicality and data licensing. However, there is no research investigating generated codes in terms of code clone and performance. In this paper, we conduct an empirical study to find code similarities and performance differences between AlphaCode-generated codes and human codes. The results show that (i) the generated codes from AlphaCode are similar to human codes (i.e., the average maximum similarity score is 0.56) and (ii) the generated code performs on par with or worse than the human code in terms of execution time and memory usage. Moreover, AlphaCode tends to generate more similar codes to humans for low-difficulty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Machine Learning and Data Classification