
TL;DR
The paper introduces the latent logarithm (lag), a probabilistically sound alternative to adding pseudocounts for log-transforming count data, accounting for measurement confidence and prior abundance.
Contribution
It proposes lag, a new method that models observed counts as noisy realizations of latent abundances, improving upon traditional pseudocount-based log transformations.
Findings
Lag provides a stable, probabilistically coherent transformation.
It accounts for measurement confidence and prior abundance.
It offers an intuitive alternative to pseudocount addition.
Abstract
Count or non-negative data are often log transformed to improve heteroscedasticity and scaling. To avoid undefined values where the data are zeros, a small pseudocount (e.g. 1) is added across the dataset prior to applying the transformation. This pseudocount considers neither the measured object's a priori abundance nor the confidence with which the measurement was made, making this practice convenient but statistically unfounded. I introduce here the latent logarithm, or lag. lag assumes that each observed measurement is a noisy realization of an unmeasured latent abundance. By taking the logarithm of this learned latent abundance, which reflects both sampling confidence/depth and the object's a priori abundance, lag provides a probabilistically coherent, stable, and intuitive alternative to the questionable, but conventional "log( + pseudocount)."
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Time Series Analysis and Forecasting · Gaussian Processes and Bayesian Inference
