Agentic Repository Mining: A Multi-Task Evaluation

Johannes H\"artel

arXiv:2605.04845·cs.SE·May 7, 2026

Agentic Repository Mining: A Multi-Task Evaluation

Johannes H\"artel

PDF

TL;DR

This paper evaluates LLM agents that explore software repositories via bash commands to classify artifacts, demonstrating robustness and competitive accuracy compared to static LLMs with pre-engineered context.

Contribution

It introduces a multi-task evaluation of dynamic LLM agents for repository classification, highlighting their scalability and robustness over traditional static approaches.

Findings

01

Agents achieve competitive accuracy across four tasks.

02

Agents avoid context-window overflows and scale with artifact size.

03

Diagnosis reveals broader context access may improve accuracy.

Abstract

Mining software repositories often requires classifying artifacts like commits, reviews, code lines, or entire repositories into categories. Human labeling is expensive and error-prone; limited context frequently leads to misclassifications or uncertainty in labels. We investigate whether LLM agents that dynamically explore repositories through standard bash commands can match the classification quality of simple LLMs that receive pre-engineered context. Across four tasks, eight approach configurations, and 4943 classifications, agents achieve competitive accuracy despite retrieving their own context. The primary advantage is robustness: agents avoid context-window overflows and scale independently of artifact size. A manual diagnosis of 100 cases where approaches disagree with the ground truth reveals specification ambiguities and labels produced under limited context, suggesting that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.