From Threads to Trajectories: A Multi-LLM Pipeline for Community Knowledge Extraction from GitHub Issue Discussions
Nazia Shehnaz Joynab, Soneya Binta Hossain

TL;DR
This paper introduces SWE-MIMIC-Bench, an automated multi-LLM pipeline that extracts structured, coherent issue trajectories from GitHub discussions to aid developers and train expert-like LLM agents.
Contribution
The paper presents a novel multi-LLM pipeline that generates detailed, label-aware issue trajectories from raw GitHub discussions, enhancing understanding and training of AI agents.
Findings
Achieved 91.7% success rate in extracting high-fidelity trajectories.
Generated 734 detailed issue trajectories from 800 real-world GitHub issues.
Demonstrated the system's effectiveness on multiple SWE-Bench datasets.
Abstract
Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated from raw GitHub discussions using an automated multi-LLM pipeline. Unlike simple summarization, this pipeline utilizes a group of closed-source LLMs to perform granular tasks: analyzing individual comments with awareness of externally-linked resources, classifying comment analyses into label-specific fields (e.g., root cause, solution plan, implementation progress), and synthesizing label-aware trajectories which capture a structured and coherent narrative of the entire discussion thread. Our pipeline uses five closed-source LLM configurations for distinct purposes: label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
