SoTaNa: The Open-Source Software Development Assistant
Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu, Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

TL;DR
SoTaNa is an open-source software development assistant that fine-tunes LLaMA with instruction data generated by ChatGPT, enabling effective software engineering support on a single GPU.
Contribution
It introduces a parameter-efficient fine-tuning method for open-source models using ChatGPT-generated data, improving software engineering tasks like answering Stack Overflow questions.
Findings
Effective in answering Stack Overflow questions
Capable of code summarization and generation
Operates on a single GPU
Abstract
Software development plays a crucial role in driving innovation and efficiency across modern societies. To meet the demands of this dynamic field, there is a growing need for an effective software development assistant. However, existing large language models represented by ChatGPT suffer from limited accessibility, including training data and model weights. Although other large open-source models like LLaMA have shown promise, they still struggle with understanding human intent. In this paper, we present SoTaNa, an open-source software development assistant. SoTaNa utilizes ChatGPT to generate high-quality instruction-based data for the domain of software engineering and employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA. We evaluate the effectiveness of \our{} in answering Stack Overflow questions and demonstrate its capabilities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Software Engineering Techniques and Practices
