Do Deep Generative Models Know What They Don't Know?
Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji, Lakshminarayanan

TL;DR
This paper reveals that popular deep generative models often assign higher likelihoods to out-of-distribution images than to in-distribution ones, challenging their reliability for out-of-distribution detection.
Contribution
The study demonstrates that flow-based models, VAEs, and PixelCNNs cannot reliably distinguish in-distribution from out-of-distribution data based on likelihoods, and provides theoretical insights into this phenomenon.
Findings
Likelihoods do not reliably indicate in-distribution vs out-of-distribution inputs.
Flow-based models' likelihood behavior can be explained by data and model curvature.
Results caution against using density estimates for out-of-distribution detection.
Abstract
A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data. A plethora of work has demonstrated that it is easy to find or synthesize inputs for which a neural network is highly confident yet wrong. Generative models are widely viewed to be robust to such mistaken confidence as modeling the density of the input features can be used to detect novel, out-of-distribution inputs. In this paper we challenge this assumption. We find that the density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we find evidence of this phenomenon when pairing several popular image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
