# A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance

**Authors:** Xiaoyang Zeng, Awais Ahmed, Muhammad Hanif Tunio

PMC · DOI: 10.3390/diagnostics15202555 · Diagnostics · 2025-10-10

## TL;DR

This paper introduces a new deep learning framework that improves patient-specific quality assurance in radiotherapy by balancing image and tabular data.

## Contribution

The novel BMMQA framework introduces modality balancing using Shapley values and task-specific fusion strategies for robust multi-task learning.

## Key findings

- BMMQA outperforms existing methods under 2%/3 mm and 2%/2 mm GPR criteria with a 15.7% MAE reduction.
- The framework achieves a peak SSIM of 0.964 in dose distribution prediction.
- It enhances robustness in critical failure cases with GPR < 90%.

## Abstract

Background: Multimodal Deep learning has emerged as a crucial method for automated patient-specific quality assurance (PSQA) in radiotherapy research. Integrating image-based dose matrices with tabular plan complexity metrics enables more accurate prediction of quality indicators, including the Gamma Passing Rate (GPR) and dose difference (DD). However, modality imbalance remains a significant challenge, as tabular encoders often dominate training, suppressing image encoders and reducing model robustness. This issue becomes more pronounced under task heterogeneity, with GPR prediction relying more on tabular data, whereas dose difference prediction (DDP) depends heavily on image features. Methods: We propose BMMQA (Balanced Multi-modal Quality Assurance), a novel framework that achieves modality balance by adjusting modality-specific loss factors to control convergence dynamics. The framework introduces four key innovations: (1) task-specific fusion strategies (softmax-weighted attention for GPR regression and spatial cascading for DD prediction); (2) a balancing mechanism supported by Shapley values to quantify modality contributions; (3) a fast network forward mechanism for efficient computation of different modality combinations; and (4) a modality-contribution-based task weighting scheme for multi-task multimodal learning. A large-scale multimodal dataset comprising 1370 IMRT plans was curated in collaboration with Peking Union Medical College Hospital (PUMCH). Results: Experimental results demonstrate that, under the standard 2%/3 mm GPR criterion, BMMQA outperforms existing fusion baselines. Under the stricter 2%/2 mm criterion, it achieves a 15.7% reduction in mean absolute error (MAE). The framework also enhances robustness in critical failure cases (GPR < 90%) and achieves a peak SSIM of 0.964 in dose distribution prediction. Conclusions: Explicit modality balancing improves predictive accuracy and strengthens clinical trustworthiness by mitigating overreliance on a single modality. This work highlights the importance of addressing modality imbalance for building trustworthy and robust AI systems in PSQA and establishes a pioneering framework for multi-task multimodal learning.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12564269/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12564269/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC12564269/full.md

---
Source: https://tomesphere.com/paper/PMC12564269