Better Conditional Density Estimation for Neural Networks
Wesley Tansey, Karl Pichotta, James G. Scott

TL;DR
This paper introduces two novel neural network approaches, Multiscale Nets and CDE Trend Filtering, for modeling full joint conditional distributions, outperforming traditional methods in different data regimes.
Contribution
The paper proposes two new methods for conditional density estimation, transforming the problem into hierarchical classification and applying trend filtering to logits.
Findings
MSNs excel with abundant data per feature
CDE Trend Filtering performs well with limited data
Both methods outperform baseline models in experiments
Abstract
The vast majority of the neural network literature focuses on predicting point values for a given set of response variables, conditioned on a feature vector. In many cases we need to model the full joint conditional distribution over the response variables rather than simply making point predictions. In this paper, we present two novel approaches to such conditional density estimation (CDE): Multiscale Nets (MSNs) and CDE Trend Filtering. Multiscale nets transform the CDE regression task into a hierarchical classification task by decomposing the density into a series of half-spaces and learning boolean probabilities of each split. CDE Trend Filtering applies a k-th order graph trend filtering penalty to the unnormalized logits of a multinomial classifier network, with each edge in the graph corresponding to a neighboring point on a discretized version of the density. We compare both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Graph Neural Networks · Machine Learning and ELM
