Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification

Hang Guo; Qing Zhang; Zixuan Gao; Siyuan Yang; Shulin Peng; Xiang Tao; Ting Yu; Yan Wang; Qingli Li

arXiv:2508.03277·cs.CV·August 6, 2025

Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification

Hang Guo, Qing Zhang, Zixuan Gao, Siyuan Yang, Shulin Peng, Xiang Tao, Ting Yu, Yan Wang, Qingli Li

PDF

TL;DR

This paper presents EmmPD, an efficient multimodal framework for placental disease classification from whole slide images, combining advanced patch selection, graph-based feature fusion, and medical report integration to improve accuracy and reduce computation.

Contribution

We introduce a novel two-stage patch selection and a hybrid multimodal fusion approach that effectively balances computational efficiency with rich feature extraction for placental disease diagnosis.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Effectively balances computational cost and diagnostic accuracy.

03

Demonstrates robustness across different datasets.

Abstract

Accurate prediction of placental diseases via whole slide images (WSIs) is critical for preventing severe maternal and fetal complications. However, WSI analysis presents significant computational challenges due to the massive data volume. Existing WSI classification methods encounter critical limitations: (1) inadequate patch selection strategies that either compromise performance or fail to sufficiently reduce computational demands, and (2) the loss of global histological context resulting from patch-level processing approaches. To address these challenges, we propose an Efficient multimodal framework for Patient-level placental disease Diagnosis, named EmmPD. Our approach introduces a two-stage patch selection module that combines parameter-free and learnable compression strategies, optimally balancing computational efficiency with critical feature preservation. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.