Detecting Adversarial Samples Using Density Ratio Estimates
Lovedeep Gondara

TL;DR
This paper introduces a density ratio estimation method for detecting adversarial samples in machine learning models, offering an efficient, model-agnostic approach applicable to various data types and adversarial generation techniques.
Contribution
It proposes a novel density ratio estimation technique for adversarial sample detection and a method to generate adversarial samples with density ratio constraints.
Findings
Effective detection of adversarial samples across different methods
Works with single and multi-channel data
Generates adversarial samples with density ratio preservation
Abstract
Machine learning models, especially based on deep architectures are used in everyday applications ranging from self driving cars to medical diagnostics. It has been shown that such models are dangerously susceptible to adversarial samples, indistinguishable from real samples to human eye, adversarial samples lead to incorrect classifications with high confidence. Impact of adversarial samples is far-reaching and their efficient detection remains an open problem. We propose to use direct density ratio estimation as an efficient model agnostic measure to detect adversarial samples. Our proposed method works equally well with single and multi-channel samples, and with different adversarial sample generation methods. We also propose a method to use density ratio estimates for generating adversarial samples with an added constraint of preserving density ratio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
