# An Entropy-Based Approach to Model Selection with Application to Single-Cell Time-Stamped Snapshot Data

**Authors:** William C. L. Stewart, Ciriyam Jayaprakash, Jayajit Das

PMC · DOI: 10.3390/e27030274 · Entropy · 2025-03-06

## TL;DR

This paper introduces a new entropy-based method for model selection in single-cell time-stamped data to better understand protein abundance dynamics.

## Contribution

The novel approach uses entropy and split-sample techniques with GMM and kernel density estimation for model selection when likelihoods are unavailable.

## Key findings

- The entropy-based method successfully selects the correct model from simulated data.
- Bootstrap procedures provide reliable model selection probabilities for competing models.

## Abstract

Recent single-cell experiments that measure copy numbers of over 40 proteins in thousands of individual cells at different time points [time-stamped snapshot (TSS) data] exhibit cell-to-cell variability. Because the same cells cannot be tracked over time, TSS data provide key information about the statistical time-evolution of protein abundances in single cells, information that could yield insights into the mechanisms influencing the biochemical signaling kinetics of a cell. However, when multiple candidate models (i.e., mechanistic models applied to initial protein abundances) can potentially explain the same TSS data, selecting the best model (i.e., model selection) is often challenging. For example, popular approaches like Kullback–Leibler divergence and Akaike’s Information Criterion are often difficult to implement largely because mathematical expressions for the likelihoods of candidate models are typically not available. To perform model selection, we introduce an entropy-based approach that uses split-sample techniques to exploit the availability of large data sets and uses (1) existing generalized method of moments (GMM) software to estimate model parameters, and (2) standard kernel density estimators and a Gaussian copula to estimate candidate models. Using simulated data, we show that our approach can select the ”ground truth” from a set of competing mechanistic models. Then, to assess the relative support for a candidate model, we compute model selection probabilities using a bootstrap procedure.

## Full-text entities

- **Genes:** Nr0b2 (nuclear receptor subfamily 0, group B, member 2) [NCBI Gene 23957] {aka SHP, SHP-1, Shp1}, Klra1 (killer cell lectin-like receptor, subfamily A, member 1) [NCBI Gene 16627] {aka A1, CH29-493D4.3, Klra22, Ly49a, Ly49o<129>, Ly49v}, Syk (spleen tyrosine kinase) [NCBI Gene 20963] {aka Sykb}, Vav1 (vav guanine nucleotide exchange factor 1) [NCBI Gene 22324] {aka Vav, vav-T}, Fcgr3 (Fc receptor, IgG, low affinity III) [NCBI Gene 14131] {aka CD16}
- **Diseases:** SMALL (MESH:D018288), injury to (MESH:D014947), LARGE (MESH:D018287), MEDIUM (MESH:C536038)
- **Chemicals:** AICc (-), W (MESH:D014414)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11941135/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11941135/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC11941135/full.md

---
Source: https://tomesphere.com/paper/PMC11941135