TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication
Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, and Jeff Huang

TL;DR
TraceScope is an interactive URL triage system that uses sandboxed browsing and LLM-based adjudication to detect sophisticated phishing URLs with high precision and recall.
Contribution
The paper introduces TraceScope, a scalable, safe, and evidence-based URL triage pipeline that improves detection of evasive phishing URLs using decoupled analysis and LLMs.
Findings
Achieves 0.94 precision and 0.78 recall on test URLs.
Outperforms prior classifiers in recall and detection of evasive phishing.
Successfully detects sophisticated phishing in real-world email datasets.
Abstract
Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigate the page while isolating themselves from potential runtime exploits. We present TraceScope, a decoupled triage pipeline that operationalizes this workflow at scale. To prevent the observer effect and ensure safety, a sandboxed operator agent drives a real GUI browser guided by visual motivation to elicit page behavior, freezing the session into an immutable evidence bundle. Separately, an adjudicator agent circumvents LLM context limitations by querying evidence on demand to verify a MITRE ATT&CK checklist, and generates an audit-ready report with extracted indicators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
