# Optimal solution to the set cover problem with a vicinity constraint for estimating genotype tissue expression profiles

**Authors:** Jiahong Dong, Stephen Brown, Kevin Truong

PMC · DOI: 10.1093/bioadv/vbaf163 · Bioinformatics Advances · 2025-07-04

## TL;DR

This paper introduces a new algorithm to estimate genome-wide gene expression profiles using fewer experiments by selecting strategically located reference genes.

## Contribution

A novel dynamic programming algorithm is proposed for solving the vicinity set cover problem efficiently and accurately.

## Key findings

- The algorithm reduces the number of required experiments by leveraging genomic proximity of genes.
- It achieves tractable runtime while minimizing the average distance between reference and non-reference genes.
- The method is applicable to organisms without existing genotype tissue expression data and can be used in expanded tissue datasets.

## Abstract

Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.

We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.

The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.

## Full-text entities

- **Genes:** PRM3 (protamine 3) [NCBI Gene 58531]
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]
- **Cell lines:** MESA — Homo sapiens (Human), Childhood T acute lymphoblastic leukemia, Cancer cell line (CVCL_XF44)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12313015/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12313015/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12313015/full.md

---
Source: https://tomesphere.com/paper/PMC12313015