Detecting Potential Local Adversarial Examples for Human-Interpretable   Defense

Xavier Renard; Thibault Laugel; Marie-Jeanne Lesot; Christophe; Marsala; Marcin Detyniecki

arXiv:1809.02397·stat.ML·September 10, 2018

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

Xavier Renard, Thibault Laugel, Marie-Jeanne Lesot, Christophe, Marsala, Marcin Detyniecki

PDF

TL;DR

This paper proposes a method to detect potential local adversarial examples in tabular data by identifying critical features influencing the classifier's decision, aiding human experts in fraud prevention.

Contribution

It introduces a novel approach to identify locally critical features for classifiers, enhancing interpretability and fraud detection in tabular data.

Findings

01

Initial proposition for detecting local adversarial examples

02

Provides critical features to human experts for decision control

03

Aims to improve fraud detection accuracy

Abstract

Machine learning models are increasingly used in the industry to make decisions such as credit insurance approval. Some people may be tempted to manipulate specific variables, such as the age or the salary, in order to get better chances of approval. In this ongoing work, we propose to discuss, with a first proposition, the issue of detecting a potential local adversarial example on classical tabular data by providing to a human expert the locally critical features for the classifier's decision, in order to control the provided information and avoid a fraud.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.