A machine learning approach for detecting CNAME cloaking-based tracking on the Web
Ha Dao, Kensuke Fukuda

TL;DR
This paper presents a supervised machine learning method to detect CNAME cloaking-based tracking on websites without relying on on-demand DNS lookup APIs, outperforming existing filter lists.
Contribution
The authors develop a novel ML-based detection approach that works without DNS lookups and effectively identifies CNAME cloaking tracking sites and requests.
Findings
The approach achieves F1 scores of 0.790 for sites and 0.885 for requests.
It outperforms well-known tracking filter lists.
Features like script count and URL length are key discriminators.
Abstract
Various in-browser privacy protection techniques have been designed to protect end-users from third-party tracking. In an arms race against these counter-measures, the tracking providers developed a new technique called CNAME cloaking based tracking to avoid issues with browsers that block third-party cookies and requests. To detect this tracking technique, browser extensions require on-demand DNS lookup APIs. This feature is however only supported by the Firefox browser. In this paper, we propose a supervised machine learning-based method to detect CNAME cloaking-based tracking without the on-demand DNS lookup. Our goal is to detect both sites and requests linked to CNAME cloaking-related tracking. We crawl a list of target sites and store all HTTP/HTTPS requests with their attributes. Then we label all instances automatically by looking up CNAME record of subdomain, and applying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Internet Traffic Analysis and Secure E-voting · Network Security and Intrusion Detection
