DomainHarvester: Harvesting Infrequently Visited Yet Trustworthy Domain Names
Daiki Chiba, Hiroki Nakano, Takashi Koide

TL;DR
DomainHarvester is a novel system that creates trust-based allow lists including infrequently visited yet legitimate domains by leveraging web link structures and machine learning, improving cybersecurity defenses.
Contribution
It introduces a bottom-up approach using hyperlink analysis and Transformer-based trust assessment to include trustworthy infrequent domains in allow lists, surpassing traditional popularity-based methods.
Findings
Allow lists with minimal overlap to existing top lists
Significantly reduces malicious domain inclusion
Enhances security and inclusivity in domain filtering
Abstract
In cybersecurity, allow lists play a crucial role in distinguishing safe websites from potential threats. Conventional methods for compiling allow lists, focusing heavily on website popularity, often overlook infrequently visited legitimate domains. This paper introduces DomainHarvester, a system aimed at generating allow lists that include trustworthy yet infrequently visited domains. By adopting an innovative bottom-up methodology that leverages the web's hyperlink structure, DomainHarvester identifies legitimate yet underrepresented domains. The system uses seed URLs to gather domain names, employing machine learning with a Transformer-based approach to assess their trustworthiness. DomainHarvester has developed two distinct allow lists: one with a global focus and another emphasizing local relevance. Compared to six existing top lists, DomainHarvester's allow lists show minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Privacy-Preserving Technologies in Data · Cryptography and Data Security
