Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Alireza Daghighfarsoodeh, Chung-Yu Wang, Hamed Taherkhani, Melika, Sepidband, Mohammad Abdollahi, Hadi Hemmati, Hung Viet Pham

TL;DR
DeepBench is a comprehensive benchmark dataset for function-level deep learning code generation, revealing significant challenges and performance disparities among large language models across different DL tasks and phases.
Contribution
The paper introduces DeepBench, a new benchmark dataset covering full DL pipelines, and provides an analysis of LLM performance and issues in DL code generation.
Findings
GPT-4 achieved 31% accuracy on DeepBench
LLMs perform significantly worse on DeepBench compared to existing benchmarks
Performance varies substantially across DL phases and tasks
Abstract
Deep learning (DL) has revolutionized areas such as computer vision, natural language processing, and more. However, developing DL systems is challenging due to the complexity of DL workflows. Large Language Models (LLMs), such as GPT, Claude, Llama, Mistral, etc., have emerged as promising tools to assist in DL code generation, offering potential solutions to these challenges. Despite this, existing benchmarks such as DS-1000 are limited, as they primarily focus on small DL code snippets related to pre/post-processing tasks and lack a comprehensive coverage of the full DL pipeline, including different DL phases and input data types. To address this, we introduce DeepBench, a novel benchmark dataset designed for function-level DL code generation. DeepBench categorizes DL problems based on three key aspects: phases such as pre-processing, model construction, and training; tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Materials Science · Natural Language Processing Techniques
MethodsLinear Layer · Multi-Head Attention · Adam · Softmax · Dropout · Weight Decay · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · Attention Dropout
