# LPA-Tuning CLIP: An Improved CLIP-Based Classification Model for Intestinal Polyps

**Authors:** Zumin Wang, Jun Gao, Wenhao Ping, Jing Qin, Changqing Ji

PMC · DOI: 10.3390/s26061764 · Sensors (Basel, Switzerland) · 2026-03-11

## TL;DR

This paper introduces a new AI model that improves intestinal polyp classification by combining endoscopic images with pathology descriptions, achieving high accuracy.

## Contribution

A multimodal framework called LPA-Tuning CLIP that integrates endoscopic images and structured pathology descriptions for improved intestinal polyp classification.

## Key findings

- The proposed model achieves 85.8% accuracy and 0.862 F1 score on an internal test set.
- It outperforms unimodal and multimodal baselines by 8.7% and 4.3%, respectively.
- The model uses cross-modal projection matching and medical-aware augmentation to enhance classification performance.

## Abstract

Background and Objective: Accurate classification of intestinal polyps is crucial for preventing colorectal cancer but is hindered by visual similarity among subtypes and endoscopic variability. While deep learning aids in diagnosis, single-modal models face efficiency–accuracy trade-offs and ignore pathological semantics. We propose a multimodal framework that integrates endoscopic images with structured pathological descriptions to bridge this gap. Methods: We propose LPA-Tuning CLIP, which incorporates three key innovations: replacing CLIP’s instance-level contrastive loss with cross-modal projection matching (CMPM) with ID loss to explicitly optimize intraclass compactness and interclass separation through label-aware image-text similarity matrices; introducing structured clinical semantic templates that encode WHO diagnostic criteria into hierarchical text prompts for consistent pathology annotations; and developing medical-aware augmentation that preserves lesion features while reducing domain shifts. Results: The experimental results demonstrate that our proposed method achieves an accuracy of 85.8% and an F1 score of 0.862 on the internal test set, establishing a new state-of-the-art performance for intestinal polyp classification. Conclusions: This study proposes a multimodal polyp classification paradigm that achieves 85.8% accuracy on three-subtype classification via endoscopic image-pathology text joint representation learning, outperforming unimodal baselines by 8.7% and a multimodal baseline by 4.3%.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Genes:** LPA (lipoprotein(a)) [NCBI Gene 4018] {aka AK38, APOA, LP}
- **Diseases:** Intestinal Polyps (MESH:D007417), colorectal cancer (MESH:D015179), polyp (MESH:D011127)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030754/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13030754/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030754/full.md

---
Source: https://tomesphere.com/paper/PMC13030754