Predicting Genetic Mutation from Whole Slide Images via   Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Gexin Huang; Chenfei Wu; Mingjie Li; Xiaojun Chang; Ling Chen; Ying; Sun; Shen Zhao; Xiaodan Liang; and Liang Lin

arXiv:2406.02990·cs.CV·June 6, 2024

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying, Sun, Shen Zhao, Xiaodan Liang, and Liang Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multi-label Transformer model, BPGT, that leverages biomedical knowledge and gene relationships to improve genetic mutation prediction from whole slide images, addressing inefficiencies and biological oversight in prior methods.

Contribution

The paper proposes a biological-knowledge enhanced Transformer architecture with a gene encoder and label decoder, integrating linguistic and biomedical knowledge for better mutation prediction.

Findings

01

BPGT outperforms state-of-the-art methods on TCGA benchmark.

02

The gene encoder effectively captures gene relationships and priors.

03

The model improves prediction accuracy by emphasizing mutation status comparisons.

Abstract

Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gexinh/bpgt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research · Biomedical Text Mining and Ontologies

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention