ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines
Nafiseh Ghaffar Nia, Vinesh Appadurai, Suchithra V., Chinmay Rane, Daniel Pittman, James Carr, Adrienne Kline

TL;DR
ConvFormer3D-TAP is a novel spatiotemporal neural network architecture that improves cine cardiac MRI view classification accuracy and robustness by integrating convolutional priors with hierarchical attention.
Contribution
It introduces a phase/uncertainty-aware fusion model combining 3D convolutional tokenization and multiscale self-attention for cine MRI view classification.
Findings
Achieved 96% validation accuracy on a large clinical dataset.
Per-class F1-scores >= 0.94 demonstrate high classification performance.
Model shows strong calibration with low expected calibration error.
Abstract
Reliable recognition of standard cine cardiac MRI views is essential because each view determines which cardiac anatomy is visualized and which quantitative analyses can be performed. Incorrect view identification, whether by a human reader or an automated deep learning system, can propagate errors into segmentation, volumetric assessment, strain analysis, and valve evaluation. However, accurate view classification remains challenging under routine clinical variability in scanner vendor, acquisition protocol, motion artifacts, and plane prescription. We present ConvFormer3D-TAP, a cine-specific spatiotemporal architecture that integrates 3D convolutional tokenization with multiscale self-attention. The model is trained using masked spatiotemporal reconstruction and uncertainty-weighted multi-clip fusion to enhance robustness across cardiac phases and ambiguous temporal segments. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
