ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines

Nafiseh Ghaffar Nia; Vinesh Appadurai; Suchithra V.; Chinmay Rane; Daniel Pittman; James Carr; Adrienne Kline

arXiv:2604.11389·cs.CV·April 14, 2026

ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines

Nafiseh Ghaffar Nia, Vinesh Appadurai, Suchithra V., Chinmay Rane, Daniel Pittman, James Carr, Adrienne Kline

PDF

TL;DR

ConvFormer3D-TAP is a novel spatiotemporal neural network architecture that improves cine cardiac MRI view classification accuracy and robustness by integrating convolutional priors with hierarchical attention.

Contribution

It introduces a phase/uncertainty-aware fusion model combining 3D convolutional tokenization and multiscale self-attention for cine MRI view classification.

Findings

01

Achieved 96% validation accuracy on a large clinical dataset.

02

Per-class F1-scores >= 0.94 demonstrate high classification performance.

03

Model shows strong calibration with low expected calibration error.

Abstract

Reliable recognition of standard cine cardiac MRI views is essential because each view determines which cardiac anatomy is visualized and which quantitative analyses can be performed. Incorrect view identification, whether by a human reader or an automated deep learning system, can propagate errors into segmentation, volumetric assessment, strain analysis, and valve evaluation. However, accurate view classification remains challenging under routine clinical variability in scanner vendor, acquisition protocol, motion artifacts, and plane prescription. We present ConvFormer3D-TAP, a cine-specific spatiotemporal architecture that integrates 3D convolutional tokenization with multiscale self-attention. The model is trained using masked spatiotemporal reconstruction and uncertainty-weighted multi-clip fusion to enhance robustness across cardiac phases and ambiguous temporal segments. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.