STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting
Yifei Cheng, Yujia Zhu, Baiyang Li, Xinhao Deng, Yitong Cai, Yaochen Ren, Qingyun Liu

TL;DR
This paper introduces STAR, a zero-shot retrieval method that aligns encrypted traffic traces with website profiles, enabling accurate website identification without prior site-specific training, thus enhancing privacy analysis and attack capabilities.
Contribution
STAR reformulates website fingerprinting as a zero-shot cross-modal retrieval problem, introducing a joint embedding space and a dual-encoder architecture trained on large-scale data.
Findings
Achieves 87.9% top-1 accuracy on unseen websites
Outperforms supervised and few-shot baselines
Identifies semantic leakage as a key privacy risk
Abstract
Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited sites from encrypted traffic patterns. Existing WF methods rely on supervised learning with site-specific labeled traces, which limits scalability and fails to handle previously unseen websites. We address these limitations by reformulating WF as a zero-shot cross-modal retrieval problem and introducing STAR. STAR learns a joint embedding space for encrypted traffic traces and crawl-time logic profiles using a dual-encoder architecture. Trained on 150K automatically collected traffic-logic pairs with contrastive and consistency objectives and structure-aware augmentation, STAR retrieves the most semantically aligned profile for a trace without requiring target-side traffic during training. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Spam and Phishing Detection · Network Security and Intrusion Detection
