Top General Performance = Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark
Dewu Zheng, Yanlin Wang, Ensheng Shi, Xilin Liu, Yuchi Ma, Hongyu, Zhang, Zibin Zheng

TL;DR
This paper introduces DomainCodeBench, a comprehensive multi-domain code generation benchmark, revealing that top general models often underperform in specific domains and that domain-specific knowledge augmentation significantly improves results.
Contribution
The paper presents a new multi-domain code generation benchmark and provides extensive analysis of LLM performance across diverse application domains, highlighting domain-specific challenges and improvements.
Findings
Top general models do not consistently perform well across all domains.
LLMs often struggle with domain knowledge gaps and library usage.
Augmenting prompts with domain-specific info boosts performance by ~38%.
Abstract
With the rapid advancement of large language models (LLMs), extensive research has been conducted to investigate the code generation capabilities of LLMs. However, existing efforts primarily focus on general-domain tasks, leaving LLMs' code generation performance in real-world application domains underexplored. This raises a critical question: can a model's general-domain coding ability reliably represent its ability in specialized domains? In this paper, we introduce DomainCodeBench, a multi-domain code generation benchmark designed to systematically evaluate LLMs across 12 software application domains and 15 programming languages. DomainCodeBench contains 2,400 manually verified tasks with ground truth, human-annotated docstrings, and fine-grained dependency information to ensure more coverage of domain-specific challenges. Specifically, we first identify the most popular application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsLib · Focus
