Canary in Twitter Mine: Collecting Phishing Reports from Experts and Non-experts
Hiroki Nakano, Daiki Chiba, Takashi Koide, Naoki Fukushi, Takeshi, Yagi, Takeo Hariu, Katsunari Yoshioka, Tsutomu Matsumoto

TL;DR
This paper introduces CrowdCanary, a Twitter-based system that collects and characterizes phishing reports from users, enabling rapid identification of phishing URLs and insights into user-reported attack details.
Contribution
The study presents a novel Twitter-based approach for real-time phishing report collection and analysis, including extraction of attack details from non-expert users.
Findings
CrowdCanary identified 35,432 phishing URLs in three months.
90.2% of these URLs were later detected by anti-virus engines.
Non-expert reports include brand names, attack details, and landing page information.
Abstract
The rise in phishing attacks via e-mail and short message service (SMS) has not slowed down at all. The first thing we need to do to combat the ever-increasing number of phishing attacks is to collect and characterize more phishing cases that reach end users. Without understanding these characteristics, anti-phishing countermeasures cannot evolve. In this study, we propose an approach using Twitter as a new observation point to immediately collect and characterize phishing cases via e-mail and SMS that evade countermeasures and reach users. Specifically, we propose CrowdCanary, a system capable of structurally and accurately extracting phishing information (e.g., URLs and domains) from tweets about phishing by users who have actually discovered or encountered it. In our three months of live operation, CrowdCanary identified 35,432 phishing URLs out of 38,935 phishing reports, 31,960…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection
