SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection
Sudheer Tubati, Amit Goyal

TL;DR
SAGE is a scalable, counterfactual-aware ensemble method for fraud detection in music streaming, effectively identifying negative cases with high confidence despite challenging edge cases.
Contribution
It introduces a novel negative harvesting approach combining stratified sampling with a modular gating ensemble to improve fraud detection accuracy.
Findings
Achieves high precision and recall on held-out data.
Generalizes across different fraud detection domains.
Addresses representation bias in positive-unlabeled learning.
Abstract
Music streaming fraud, where bad actors artificially inflate stream counts to manipulate chart rankings and royalty payments, poses a significant threat to streaming services and legitimate content creators. Traditional fraud detection approaches struggle with a critical challenge: many legitimate edge cases, including super-fans and sleep-music sessions, exhibit activity patterns that closely mimic those of coordinated fraud. We present SAGE, a novel counterfactual-aware negative harvesting approach that combines SimHash-based stratified sampling with a modular gating ensemble for confident negative identification from unlabeled data. Our ensemble architecture employs pluggable statistical gates (currently instantiated with Mahalanobis distance and k-NN density) with configurable voting thresholds enabling adaptive precision-recall trade-offs. This addresses the representation bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
