A data-driven model for spectra: Finding double redshifts in the Sloan Digital Sky Survey
P. Tsalmantza, David W. Hogg

TL;DR
This paper introduces a probabilistic, data-driven spectral modeling method that effectively detects double redshifts in SDSS spectra, outperforming traditional PCA by incorporating measurement uncertainties and handling missing data.
Contribution
The authors develop heteroscedastic matrix factorization, a novel probabilistic spectral analysis technique that accounts for observational uncertainties and improves redshift detection accuracy.
Findings
Successfully identified 129 of 131 lens candidates
Detected all known binary black-hole spectra
Minimal false positives in hypothesis testing
Abstract
We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis, but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
