Natural Language Generation and Understanding of Big Code for   AI-Assisted Programming: A Review

Man Fai Wong; Shangxin Guo; Ching Nam Hang; Siu Wai Ho; Chee Wei Tan

arXiv:2307.02503·cs.SE·July 7, 2023

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review

Man Fai Wong, Shangxin Guo, Ching Nam Hang, Siu Wai Ho, Chee Wei Tan

PDF

TL;DR

This review discusses how transformer-based large language models trained on Big Code are transforming AI-assisted programming tasks like code generation, completion, and defect detection, highlighting current applications and future challenges.

Contribution

It provides a comprehensive overview of LLMs in AI-assisted programming, analyzing their applications, challenges, and opportunities in integrating NLP with software naturalness.

Findings

01

LLMs like Codex and AlphaCode enable advanced code generation and understanding.

02

NLP techniques improve defect detection and code summarization.

03

Challenges include data quality, model interpretability, and extending capabilities to mobile development.

Abstract

This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus