# Low-rank model with covariates for count data analysis

**Authors:** Genevi\`eve Robin (CMAP, XPOP), Julie Josse (CMAP, XPOP), Eric, Moulines (CMAP, LTCI, XPOP), Sylvain Sardy

arXiv: 1703.02296 · 2018-10-25

## TL;DR

This paper introduces LORI, a comprehensive low-rank modeling approach with covariates for count data, providing theoretical guarantees, an algorithm, and software, and demonstrating superior performance in estimation and imputation tasks.

## Contribution

The paper presents LORI, a novel methodology combining a Poisson model, an algorithm, and automatic regularization selection for count data with covariates, including theoretical error bounds.

## Key findings

- LORI outperforms existing methods in estimation accuracy.
- LORI effectively imputes missing count data.
- The method is validated on ecological and biological datasets.

## Abstract

Count data are collected in many scientific and engineering tasks including image processing, single-cell RNA sequencing and ecological studies. Such data sets often contain missing values, for example because some ecological sites cannot be reached in a certain year. In addition, in many instances, side information is also available, for example covariates about ecological sites or species. Low-rank methods are popular to denoise and impute count data, and benefit from a substantial theoretical background. Extensions accounting for covariates have been proposed, but to the best of our knowledge their theoretical and empirical properties have not been thoroughly studied, and few softwares are available for practitioners. We propose a complete methodology called LORI (Low-Rank Interaction), including a Poisson model, an algorithm, and automatic selection of the regularization parameter, to analyze count tables with covariates. We also derive an upper bound on the estimation error. We provide a simulation study with synthetic data, revealing empirically that LORI improves on state of the art methods in terms of estimation and imputation of the missing values. We illustrate how the method can be interpreted through visual displays with the analysis of a well-know plant abundance data set, and show that the LORI outputs are consistent with known results. Finally we demonstrate the relevance of the methodology by analyzing a water-birds abundance table from the French national agency for wildlife and hunting management (ONCFS). The method is available in the R package lori on the Comprehensive Archive Network (CRAN).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.02296/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1703.02296/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1703.02296/full.md

---
Source: https://tomesphere.com/paper/1703.02296