# MMPCS: multi-view molecular pretraining based on consistency information and specific information

**Authors:** Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang

PMC · DOI: 10.1093/bioinformatics/btag028 · Bioinformatics · 2026-01-14

## TL;DR

This paper introduces MMPCS, a new method for molecular representation learning that improves predictions by combining shared and view-specific information from multiple molecular views.

## Contribution

MMPCS explicitly factorizes molecular representations into consistency and specific components, achieving state-of-the-art performance on multiple molecular tasks.

## Key findings

- MMPCS outperformed 16 existing methods in molecular property prediction tasks.
- It showed strong performance in predicting drug-target binding affinity and cancer drug response.
- The method was effective in a case study for SARS-CoV-2 drug repurposing.

## Abstract

The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, 2D graphs, and 3D conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.

To bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode 2D molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.

The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).

## Linked entities

- **Diseases:** cancer (MONDO:0004992), SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Genes:** TXK (TXK tyrosine kinase) [NCBI Gene 7294] {aka BTKL, PSCTK5, PTK4, RLK, TKL}
- **Diseases:** Malaria (MESH:D008288), PCC (MESH:C536353), COVID-19 (MESH:D000086382), SCC (MESH:D010300), lung inflammation (MESH:D011014), CEP (MESH:D017092), infection (MESH:D007239), toxicity (MESH:D064420), cancer (MESH:D009369)
- **Chemicals:** Abemaciclib (MESH:C000590451), Lp (MESH:D008070), halogens (MESH:D006219), Brigatinib (MESH:C000598580), Copanlisib (MESH:C000589253), MMPCS (MESH:C062208), ClinTox (-)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** V600E

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12881828/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12881828/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/PMC12881828/full.md

---
Source: https://tomesphere.com/paper/PMC12881828