# ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering

**Authors:** Qinhao Jia, Shuxian Liu, Mingliang Chen, Tianyi Li, Jing Yang

PMC · DOI: 10.3390/tomography11100115 · Tomography · 2025-10-20

## TL;DR

This paper introduces ECSA, a new framework that improves medical visual question answering by addressing data scarcity and knowledge forgetting in a unified way.

## Contribution

The novel Evolvable Clinical-Semantic Alignment (ECSA) framework combines two modules to tackle few-shot learning and catastrophic forgetting in Med-VQA.

## Key findings

- ECSA achieves strong performance with 80.15% accuracy on VQA-RAD and 85.10% on SLAKE.
- The framework shows good generalization with 64.57% accuracy on PathVQA and 82.23% on VQA-Med-2019.
- ECSA maintains a low forgetting rate of 13.50% in continual learning scenarios.

## Abstract

Objective: Medical Visual Question Answering (Med-VQA), a key technology that integrates computer vision and natural language processing to assist in clinical diagnosis, possesses significant potential for enhancing diagnostic efficiency and accuracy. However, its development is constrained by two major bottlenecks: weak few-shot generalization capability stemming from the scarcity of high-quality annotated data and the problem of catastrophic forgetting when continually learning new knowledge. Existing research has largely addressed these two challenges in isolation, lacking a unified framework. Methods: To bridge this gap, this paper proposes a novel Evolvable Clinical-Semantic Alignment (ECSA) framework, designed to synergistically solve these two challenges within a single architecture. ECSA is built upon powerful pre-trained vision (BiomedCLIP) and language (Flan-T5) models, with two innovative modules at its core. First, we design a Clinical-Semantic Disambiguation Module (CSDM), which employs a novel debiased hard negative mining strategy for contrastive learning. This enables the precise discrimination of “hard negatives” that are visually similar but clinically distinct, thereby significantly enhancing the model’s representation ability in few-shot and long-tail scenarios. Second, we introduce a Prompt-based Knowledge Consolidation Module (PKC), which acts as a rehearsal-free non-parametric knowledge store. It consolidates historical knowledge by dynamically accumulating and retrieving task-specific “soft prompts,” thus effectively circumventing catastrophic forgetting without relying on past data. Results: Extensive experimental results on four public benchmark datasets, VQA-RAD, SLAKE, PathVQA, and VQA-Med-2019, demonstrate ECSA’s state-of-the-art or highly competitive performance. Specifically, ECSA achieves excellent overall accuracies of 80.15% on VQA-RAD and 85.10% on SLAKE, while also showing strong generalization with 64.57% on PathVQA and 82.23% on VQA-Med-2019. More critically, in continual learning scenarios, the framework achieves a low forgetting rate of just 13.50%, showcasing its significant advantages in knowledge retention. Conclusions: These findings validate the framework’s substantial potential for building robust and evolvable clinical decision support systems.

## Full-text entities

- **Genes:** PRRT2 (proline rich transmembrane protein 2) [NCBI Gene 112476] {aka BFIC2, BFIS2, DSPB3, DYT10, EKD1, FICCA}
- **Diseases:** VQA (MESH:D014786), lung squamous cell carcinoma (MESH:D002294), liver (MESH:D017093), cancers (MESH:D009369), ECSA (MESH:D057180), hallucination (MESH:D006212), lung cancer (MESH:D008175), loss (MESH:D016388), abnormalities in (MESH:D000014), injury to (MESH:D014947), lung adenocarcinoma (MESH:D000077192)
- **Chemicals:** VRAM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12567919/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12567919/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12567919/full.md

---
Source: https://tomesphere.com/paper/PMC12567919