Advancing Language Models for Code-related Tasks
Zhao Tian

TL;DR
This paper introduces novel techniques to improve language models for code tasks by enhancing data quality, architecture, and reasoning, aiming to boost their practical use in software engineering.
Contribution
It presents new data augmentation, architecture, and reasoning methods that collectively advance the capabilities of code-related language models.
Findings
Improved code data quality with adversarial augmentation and denoising.
Enhanced model architecture with syntax-guided LMs (LEAM and LEAM++).
Advanced reasoning with muFiX prompting and agent-based techniques.
Abstract
Recent advances in language models (LMs) have driven significant progress in various software engineering tasks. However, existing LMs still struggle with complex programming scenarios due to limitations in data quality, model architecture, and reasoning capability. This research systematically addresses these challenges through three complementary directions: (1) improving code data quality with a code difference-guided adversarial augmentation technique (CODA) and a code denoising technique (CodeDenoise); (2) enhancing model architecture via syntax-guided code LMs (LEAM and LEAM++); and (3) advancing model reasoning with a prompting technique (muFiX) and an agent-based technique (Specine). These techniques aim to promote the practical adoption of LMs in software development and further advance intelligent software engineering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Model-Driven Software Engineering Techniques
