Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank
Caleb Gross

TL;DR
This paper introduces SiftRank, a scalable, LLM-based ranking algorithm for prioritizing security vulnerabilities directly on large datasets, significantly reducing analysis time and cost.
Contribution
SiftRank is a novel, efficient ranking method that operates directly on thousands of items using LLMs, without needing a separate initial filtering step.
Findings
Successfully identified vulnerabilities in under 2 minutes
Operates efficiently on large datasets with thousands of items
Requires minimal infrastructure and no domain-specific fine-tuning
Abstract
Security research is fundamentally a problem of resource constraint and consequent prioritization. There is simply too much attack surface and too little time and energy to spend analyzing it all. The most effective security researchers are often those who are most skilled at intuitively deciding which part of an expansive attack surface to investigate. We demonstrate that this problem of selecting the most promising option from among many possibilities can be reframed as an information retrieval problem, and solved using document ranking techniques with LLMs performing the heavy lifting as general-purpose rankers. We present SiftRank, a ranking algorithm achieving O(n) complexity through three key mechanisms: listwise ranking using an LLM to order documents in small batches of approximately 10 items at a time; inflection-based convergence detection that adaptively terminates ranking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
