Agentic Repository Mining: A Multi-Task Evaluation
Johannes H\"artel

TL;DR
This paper evaluates LLM agents that explore software repositories via bash commands to classify artifacts, demonstrating robustness and competitive accuracy compared to static LLMs with pre-engineered context.
Contribution
It introduces a multi-task evaluation of dynamic LLM agents for repository classification, highlighting their scalability and robustness over traditional static approaches.
Findings
Agents achieve competitive accuracy across four tasks.
Agents avoid context-window overflows and scale with artifact size.
Diagnosis reveals broader context access may improve accuracy.
Abstract
Mining software repositories often requires classifying artifacts like commits, reviews, code lines, or entire repositories into categories. Human labeling is expensive and error-prone; limited context frequently leads to misclassifications or uncertainty in labels. We investigate whether LLM agents that dynamically explore repositories through standard bash commands can match the classification quality of simple LLMs that receive pre-engineered context. Across four tasks, eight approach configurations, and 4943 classifications, agents achieve competitive accuracy despite retrieving their own context. The primary advantage is robustness: agents avoid context-window overflows and scale independently of artifact size. A manual diagnosis of 100 cases where approaches disagree with the ground truth reveals specification ambiguities and labels produced under limited context, suggesting that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
