SELU: A Software Engineering Language Understanding Benchmark
Fabian C. Pe\~na, Steffen Herbold

TL;DR
SELU introduces a comprehensive benchmark for evaluating large language models on diverse software engineering natural language understanding tasks, highlighting the strengths of fine-tuned models and questioning the benefits of domain-specific pre-training.
Contribution
This work presents the first extensive benchmark for SE NLU tasks, evaluates multiple LLMs, and provides insights into the effects of domain adaptation and fine-tuning strategies.
Findings
Fine-tuned models outperform zero-shot and prompt-based approaches.
Domain-specific pre-training does not significantly improve performance.
Fine-tuned models show high performance and low variance across tasks.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in code understanding and generation. However, their effectiveness on non-code Software Engineering (SE) tasks remains underexplored. We present 'Software Engineering Language Understanding' (SELU), the first comprehensive benchmark for evaluating LLMs on 22 SE textual artifacts NLU tasks, spanning from identifying whether a requirement is functional or non-functional to estimating the effort required to implement a development task. SELU covers classification, regression, Named Entity Recognition (NER), and Masked Language Modeling (MLM) tasks, with data drawn from diverse sources such as issue tracking systems and developer forums. We fine-tune 22 open-source LLMs, both generalist and domain-adapted; and prompt two proprietary alternatives using zero-shot a 3-shot prompting strategies. Performance is measured using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling
