TrustGAN: Training safe and trustworthy deep learning models through generative adversarial networks
H\'elion du Mas des Bourboux

TL;DR
TrustGAN is a generative adversarial network pipeline designed to enhance the confidence estimation of deep learning models, enabling safer deployment by identifying unreliable predictions without altering the original model.
Contribution
It introduces a model-agnostic TrustGAN pipeline that improves confidence estimation in deep models without affecting their predictive accuracy, suitable for real-world, safety-critical applications.
Findings
TrustGAN reduces confidence on out-of-distribution samples.
The pipeline improves trustworthiness in image and signal classification tasks.
Code is publicly released for reproducibility.
Abstract
Deep learning models have been developed for a variety of tasks and are deployed every day to work in real conditions. Some of these tasks are critical and models need to be trusted and safe, e.g. military communications or cancer diagnosis. These models are given real data, simulated data or combination of both and are trained to be highly predictive on them. However, gathering enough real data or simulating them to be representative of all the real conditions is: costly, sometimes impossible due to confidentiality and most of the time impossible. Indeed, real conditions are constantly changing and sometimes are intractable. A solution is to deploy machine learning models that are able to give predictions when they are confident enough otherwise raise a flag or abstain. One issue is that standard models easily fail at detecting out-of-distribution samples where their predictions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis
Methodsfail
