KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development?
Xue Jiang, Ge Li, Jiaru Qian, Xianjie Shi, Chenjie Li, Hao Zhu, Ziyu Wang, Jielun Zhang, Zheyu Zhao, Lingwei Wu, Kechi Zhang, Jia Li, Wenpin Jiao, Zhi Jin, Yihong Dong

TL;DR
KOCO-BENCH is a new benchmark for evaluating how well large language models can learn and apply domain-specific knowledge in software development across multiple specialized domains.
Contribution
It introduces KOCO-BENCH, a comprehensive benchmark with knowledge corpora and multi-level tasks, to assess domain specialization methods for LLMs in real-world software projects.
Findings
State-of-the-art LLMs struggle with domain-specific tasks.
Existing domain specialization methods yield only marginal improvements.
The best model achieves just 34.2% accuracy, indicating significant challenges.
Abstract
Large language models (LLMs) excel at general programming but struggle with domain-specific software development, necessitating domain specialization methods for LLMs to learn and utilize domain knowledge and data. However, existing domain-specific code benchmarks cannot evaluate the effectiveness of domain specialization methods, which focus on assessing what knowledge LLMs possess rather than how they acquire and apply new knowledge, lacking explicit knowledge corpora for developing domain specialization methods. To this end, we present KOCO-BENCH, a novel benchmark designed for evaluating domain specialization methods in real-world software development. KOCO-BENCH contains 6 emerging domains with 11 software frameworks and 25 projects, featuring curated knowledge corpora alongside multi-granularity evaluation tasks including domain code generation (from function-level to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
