SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection

Sudheer Tubati; Amit Goyal

arXiv:2605.20157·cs.LG·May 20, 2026

SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection

Sudheer Tubati, Amit Goyal

PDF

TL;DR

SAGE is a scalable, counterfactual-aware ensemble method for fraud detection in music streaming, effectively identifying negative cases with high confidence despite challenging edge cases.

Contribution

It introduces a novel negative harvesting approach combining stratified sampling with a modular gating ensemble to improve fraud detection accuracy.

Findings

01

Achieves high precision and recall on held-out data.

02

Generalizes across different fraud detection domains.

03

Addresses representation bias in positive-unlabeled learning.

Abstract

Music streaming fraud, where bad actors artificially inflate stream counts to manipulate chart rankings and royalty payments, poses a significant threat to streaming services and legitimate content creators. Traditional fraud detection approaches struggle with a critical challenge: many legitimate edge cases, including super-fans and sleep-music sessions, exhibit activity patterns that closely mimic those of coordinated fraud. We present SAGE, a novel counterfactual-aware negative harvesting approach that combines SimHash-based stratified sampling with a modular gating ensemble for confident negative identification from unlabeled data. Our ensemble architecture employs pluggable statistical gates (currently instantiated with Mahalanobis distance and k-NN density) with configurable voting thresholds enabling adaptive precision-recall trade-offs. This addresses the representation bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.