Secret Leak Detection in Software Issue Reports using LLMs: A Comprehensive Evaluation
Sadif Ahmed, Md Nafiu Rahman, Zahin Wahab, Gias Uddin, Rifat Shahriyar

TL;DR
This paper presents a comprehensive evaluation of secret leak detection in GitHub issue reports using LLMs, introducing a new benchmark dataset and a hybrid detection pipeline that outperforms prior methods.
Contribution
It introduces a large-scale benchmark dataset and a hybrid detection pipeline combining regex and LLMs for effective secret leak detection in issue reports.
Findings
Regex and entropy methods have high recall but low precision.
Open-source LLMs like Qwen and LLaMA achieve up to 94.49% F1 score.
The approach generalizes well to real-world GitHub repositories.
Abstract
In the digital era, accidental exposure of sensitive information such as API keys, tokens, and credentials is a growing security threat. While most prior work focuses on detecting secrets in source code, leakage in software issue reports remains largely unexplored. This study fills that gap through a large-scale analysis and a practical detection pipeline for exposed secrets in GitHub issues. Our pipeline combines regular expression-based extraction with large language model (LLM)-based contextual classification to detect real secrets and reduce false positives. We build a benchmark of 54,148 instances from public GitHub issues, including 5,881 manually verified true secrets. Using this dataset, we evaluate entropy-based baselines and keyword heuristics used by prior secret detection tools, classical machine learning, deep learning, and LLM-based methods. Regex and entropy based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
