Tradeoffs in Streaming Binary Classification under Limited Inspection Resources
Parisa Hassanzadeh, Danial Dervovic, Samuel Assefa, Prashant Reddy,, Manuela Veloso

TL;DR
This paper analyzes the tradeoffs in streaming binary classification when inspection resources are limited, comparing different suspicious event selection methods and their effectiveness in imbalanced data scenarios.
Contribution
It introduces a model for sequential event inspection with limited capacity, providing analytical bounds and empirical validation for various selection strategies.
Findings
Adaptive thresholds improve detection rates under resource constraints.
Class imbalance significantly affects the tradeoff between detection and inspection capacity.
Analytical bounds closely match empirical results on real datasets.
Abstract
Institutions are increasingly relying on machine learning models to identify and alert on abnormal events, such as fraud, cyber attacks and system failures. These alerts often need to be manually investigated by specialists. Given the operational cost of manual inspections, the suspicious events are selected by alerting systems with carefully designed thresholds. In this paper, we consider an imbalanced binary classification problem, where events arrive sequentially and only a limited number of suspicious events can be inspected. We model the event arrivals as a non-homogeneous Poisson process, and compare various suspicious event selection methods including those based on static and adaptive thresholds. For each method, we analytically characterize the tradeoff between the minority-class detection rate and the inspection capacity as a function of the data class imbalance and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques · Advanced Statistical Process Monitoring
