Augmenting the Generality and Performance of Large Language Models for Software Engineering
Fabian C. Pe\~na

TL;DR
This paper explores enhancing large language models for broader software engineering tasks beyond code, focusing on understanding their capabilities, evaluating their knowledge, and detecting hallucinations to improve their utility.
Contribution
It introduces new benchmarks, evaluates diverse LLMs on non-code SE tasks, and proposes methods for hallucination detection, expanding LLM applications in software engineering.
Findings
Performance improvements on non-code SE tasks
Effective hallucination detection methods
Evaluation of LLMs as sources of foundational SE knowledge
Abstract
Large Language Models (LLMs) are revolutionizing software engineering (SE), with special emphasis on code generation and analysis. However, their applications to broader SE practices including conceptualization, design, and other non-code tasks, remain partially underexplored. This research aims to augment the generality and performance of LLMs for SE by (1) advancing the understanding of how LLMs with different characteristics perform on various non-code tasks, (2) evaluating them as sources of foundational knowledge in SE, and (3) effectively detecting hallucinations on SE statements. The expected contributions include a variety of LLMs trained and evaluated on domain-specific datasets, new benchmarks on foundational knowledge in SE, and methods for detecting hallucinations. Initial results in terms of performance improvements on various non-code tasks are promising.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software System Performance and Reliability · Business Process Modeling and Analysis
