Top General Performance = Top Domain Performance? DomainCodeBench: A   Multi-domain Code Generation Benchmark

Dewu Zheng; Yanlin Wang; Ensheng Shi; Xilin Liu; Yuchi Ma; Hongyu; Zhang; Zibin Zheng

arXiv:2412.18573·cs.SE·March 18, 2025

Top General Performance = Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark

Dewu Zheng, Yanlin Wang, Ensheng Shi, Xilin Liu, Yuchi Ma, Hongyu, Zhang, Zibin Zheng

PDF

Open Access 2 Repos

TL;DR

This paper introduces DomainCodeBench, a comprehensive multi-domain code generation benchmark, revealing that top general models often underperform in specific domains and that domain-specific knowledge augmentation significantly improves results.

Contribution

The paper presents a new multi-domain code generation benchmark and provides extensive analysis of LLM performance across diverse application domains, highlighting domain-specific challenges and improvements.

Findings

01

Top general models do not consistently perform well across all domains.

02

LLMs often struggle with domain knowledge gaps and library usage.

03

Augmenting prompts with domain-specific info boosts performance by ~38%.

Abstract

With the rapid advancement of large language models (LLMs), extensive research has been conducted to investigate the code generation capabilities of LLMs. However, existing efforts primarily focus on general-domain tasks, leaving LLMs' code generation performance in real-world application domains underexplored. This raises a critical question: can a model's general-domain coding ability reliably represent its ability in specialized domains? In this paper, we introduce DomainCodeBench, a multi-domain code generation benchmark designed to systematically evaluate LLMs across 12 software application domains and 15 programming languages. DomainCodeBench contains 2,400 manually verified tasks with ground truth, human-annotated docstrings, and fine-grained dependency information to ensure more coverage of domain-specific challenges. Specifically, we first identify the most popular application…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security

MethodsLib · Focus