From Reviewers' Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs

Jiangrui Zheng; Yingming Zhou; Ali Abdullah Ahmad; Hanqing Yao; and Xueqing Liu

arXiv:2511.18608·cs.SE·November 25, 2025

From Reviewers' Lens: Understanding Bug Bounty Report Invalid Reasons with LLMs

Jiangrui Zheng, Yingming Zhou, Ali Abdullah Ahmad, Hanqing Yao, and Xueqing Liu

PDF

Open Access

TL;DR

This paper investigates the use of large language models to identify invalid bug bounty reports, proposing a retrieval-augmented approach to improve detection accuracy and understand reviewer decision factors.

Contribution

It introduces a taxonomy of rejection reasons and a RAG framework to enhance invalid report classification and interpretability in bug bounty platforms.

Findings

01

LLMs like GPT-5 and RoBERTa achieve high accuracy but often over-accept invalid reports.

02

The RAG framework improves classification consistency and reduces bias.

03

Reviewer reputation influences decision outcomes in borderline cases.

Abstract

Bug bounty platforms (e.g., HackerOne, BugCrowd) leverage crowd-sourced vulnerability discovery to improve continuous coverage, reduce the cost of discovery, and serve as an integral complement to internal red teams. With the rise of AI-generated bug reports, little work exists to help bug hunters understand why these reports are labeled as invalid. To improve report quality and reduce reviewers' burden, it is critical to predict invalid reports and interpret invalid reasons. In this work, we conduct an empirical study with the purpose of helping bug hunters understand the validity of reports. We collect a dataset of 9,942 disclosed bug bounty reports, including 1,400 invalid reports, and evaluate whether state-of-the-art large language models can identify invalid reports. While models such as GPT-5, DeepSeek, and a fine-tuned RoBERTa achieve strong overall accuracy, they consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic integrity and plagiarism · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities