SnipGen: A Mining Repository Framework for Evaluating LLMs for Code

Daniel Rodriguez-Cardenas; Alejandro Velasco; Denys Poshyvanyk

arXiv:2502.07046·cs.SE·February 18, 2025

SnipGen: A Mining Repository Framework for Evaluating LLMs for Code

Daniel Rodriguez-Cardenas, Alejandro Velasco, Denys Poshyvanyk

PDF

Open Access

TL;DR

SnipGen is a framework that mines GitHub data to create robust testbeds for evaluating large language models' code generation capabilities, addressing data contamination issues in software engineering research.

Contribution

It introduces a novel mining framework and dataset that enable more accurate and nuanced evaluation of LLMs for code tasks, with prompt engineering techniques.

Findings

01

Mined approximately 227K data points from GitHub commits.

02

Developed prompt templates for nuanced LLM assessment.

03

Provided a dataset and methodology for rigorous evaluation.

Abstract

Language Models (LLMs), such as transformer-based neural networks trained on billions of parameters, have become increasingly prevalent in software engineering (SE). These models, trained on extensive datasets that include code repositories, exhibit remarkable capabilities for SE tasks. However, evaluating their effectiveness poses significant challenges, primarily due to the potential overlap between the datasets used for training and those employed for evaluation. To address this issue, we introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation. SnipGen aims to mitigate data contamination by generating robust testbeds and crafting tailored data points to assist researchers and practitioners in evaluating LLMs for code-related tasks. In our exploratory study, SnipGen mined approximately…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security