# SSL-VQ: vector-quantized variational autoencoders for semi-supervised prediction of therapeutic targets across diverse diseases

**Authors:** Satoko Namba, Chen Li, Noriko Yuyama Otani, Yoshihiro Yamanishi

PMC · DOI: 10.1093/bioinformatics/btaf039 · Bioinformatics · 2025-01-28

## TL;DR

This paper introduces a machine learning method to predict therapeutic targets for various diseases, including those with no known treatments.

## Contribution

A novel semi-supervised approach using VQ-VAEs to predict therapeutic targets for uncharacterized diseases and proteins.

## Key findings

- The method successfully predicts therapeutic targets for 79 diseases, including uncharacterized ones.
- Cross-cell and cross-disease representation learning improves prediction accuracy.
- The model enables target repositioning and new indication prediction.

## Abstract

Identifying effective therapeutic targets poses a challenge in drug discovery, especially for uncharacterized diseases without known therapeutic targets (e.g. rare diseases, intractable diseases).

This study presents a novel machine learning approach using multimodal vector-quantized variational autoencoders (VQ-VAEs) for predicting therapeutic target molecules across diseases. To address the lack of known therapeutic target–disease associations, we incorporate the information on uncharacterized diseases without known targets or uncharacterized proteins without known indications (applicable diseases) in the semi-supervised learning (SSL) framework. The method integrates disease-specific and protein perturbation profiles with genetic perturbations (e.g. gene knockdowns and gene overexpressions) at the transcriptome level. Cross-cell representation learning, facilitated by VQ-VAEs, was performed to extract informative features from protein perturbation profiles across diverse human cell types. Concurrently, cross-disease representation learning was performed, leveraging VQ-VAE, to extract informative features reflecting disease states from disease-specific profiles. The model’s applicability to uncharacterized diseases or proteins is enhanced by considering the consistency between disease-specific and patient-specific signatures. The efficacy of the method is demonstrated across three practical scenarios for 79 diseases: target repositioning for target–disease pairs, new target prediction for uncharacterized diseases, and new indication prediction for uncharacterized proteins. This method is expected to be valuable for identifying therapeutic targets across various diseases.

Code: github.com/YamanishiLab/SSL-VQ and Data: 10.5281/zenodo.14644837.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11842052/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11842052/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC11842052/full.md

---
Source: https://tomesphere.com/paper/PMC11842052