AICoderEval: Improving AI Domain Code Generation of Large Language Models
Yinghui Xia, Yuyan Chen, Tianyu Shi, Jun Wang, Jinsong Yang

TL;DR
This paper introduces AICoderEval, a comprehensive dataset for evaluating AI models' ability to generate high-level, real-world task-specific code across multiple domains, and proposes a new framework and model to enhance this capability.
Contribution
The paper presents AICoderEval, a new dataset for real-world code generation tasks, and introduces CoderGen and AICoder, a framework and a model that significantly improve task-specific code generation performance.
Findings
AICoder outperforms existing code generation models.
CoderGen improves LLMs' code generation by 12% on pass@1.
AICoderEval covers diverse real-world domains.
Abstract
Automated code generation is a pivotal capability of large language models (LLMs). However, assessing this capability in real-world scenarios remains challenging. Previous methods focus more on low-level code generation, such as model loading, instead of generating high-level codes catering for real-world tasks, such as image-to-text, text classification, in various domains. Therefore, we construct AICoderEval, a dataset focused on real-world tasks in various domains based on HuggingFace, PyTorch, and TensorFlow, along with comprehensive metrics for evaluation and enhancing LLMs' task-specific code generation capability. AICoderEval contains test cases and complete programs for automated evaluation of these tasks, covering domains such as natural language processing, computer vision, and multimodal learning. To facilitate research in this area, we open-source the AICoderEval dataset at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsFocus
