HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

Ruicheng Yuan; Zhenxuan Zhang; Anbang Wang; Liwei Hu; Xiangqian Hua; Yaya Peng; Jiawei Luo; Guang Yang

arXiv:2603.19957·cs.CV·March 23, 2026

HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

Ruicheng Yuan, Zhenxuan Zhang, Anbang Wang, Liwei Hu, Xiangqian Hua, Yaya Peng, Jiawei Luo, Guang Yang

PDF

Open Access

TL;DR

HiPath is a hierarchical vision-language model designed for structured pathology report prediction, effectively encoding multi-granular diagnostic information from pathology images and reports, outperforming baselines on real-world Chinese datasets.

Contribution

The paper introduces HiPath, a novel lightweight framework with hierarchical modules for structured pathology report prediction using frozen vision-language backbones.

Findings

01

Achieves 68.9% strict accuracy on real-world data.

02

Maintains high safety rate of 97.3%.

03

Generalizes well across hospitals with minimal accuracy drop.

Abstract

Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured report prediction as its primary training objective. Three trainable modules totalling 15M parameters address complementary aspects of the problem: a Hierarchical Patch Aggregator (HiPA) for multi-image visual encoding, Hierarchical Contrastive Learning (HiCL) for cross-modal alignment via optimal transport, and Slot-based Masked Diagnosis Prediction (Slot-MDP) for structured diagnosis generation. Trained on 749K real-world Chinese pathology cases from three hospitals, HiPath achieves 68.9% strict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications