Merging Two Cultures: Deep and Statistical Learning
Anindya Bhadra, Jyotishka Datta, Nick Polson, Vadim Sokolov, Jianeng, Xu

TL;DR
This paper unifies deep learning and statistical modeling by framing deep architectures as nonlinear feature generators with probabilistic output layers, enabling scalable prediction and uncertainty quantification.
Contribution
It introduces a general framework combining sparse regularization, stochastic gradient optimization, and probabilistic output layers to merge deep and statistical learning benefits.
Findings
Deep models generate nonlinear features for statistical methods.
Probabilistic output layers enable uncertainty quantification.
Framework applies to regression, classification, and interpolation.
Abstract
Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Machine Learning and Data Classification
MethodsTanh Activation · Sigmoid Activation · Principal Components Analysis · Long Short-Term Memory · Gaussian Process
