Building K-Anonymous User Cohorts with\\ Consecutive Consistent Weighted Sampling (CCWS)
Xinyi Zheng, Weijie Zhao, Xiaoyun Li, Ping Li

TL;DR
This paper introduces CCWS, a scalable algorithm for building user cohorts that ensures K-anonymity, improving privacy-preserving cohort grouping in digital advertising over existing hashing methods.
Contribution
The paper proposes a novel cohort building algorithm combining consistent weighted sampling and hierarchical clustering to guarantee K-anonymity in large-scale datasets.
Findings
CCWS outperforms SignRP, MinHash, and vanilla CWS in accuracy and efficiency.
Demonstrated on a dataset of over 70 million users, showing scalability and improved privacy.
Achieves substantial improvements in cohort quality and privacy protection.
Abstract
To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable -anonymous cohort building algorithm called {\em consecutive consistent weighted sampling} (CCWS). The proposed method combines the spirit of the (-powered) consistent weighted sampling and hierarchical clustering, so that the -anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Internet Traffic Analysis and Secure E-voting · Privacy-Preserving Technologies in Data
