Beyond Augmentation: Cross-Modal Transformer Fusion with Bi-directional Attention for Low-Data Aneurysm Screening
Antara Titikhsha, Divyanshu Tak

TL;DR
This paper introduces CMTF-Net, a cross-modal transformer model that improves low-data aneurysm screening by encoding vascular anatomy and providing interpretable, localized activation.
Contribution
It presents a novel anatomically structured reasoning framework that enhances aneurysm detection accuracy and interpretability in low-data scenarios.
Findings
Achieves near-perfect AUC-ROC in aneurysm screening
Maintains high precision despite class imbalance
Provides spatially localized activation maps for interpretability
Abstract
Intracranial aneurysm rupture causes subarachnoid hemorrhage with mortality near 50%, making early detection critical. Although CTA enables rapid screening, detecting small aneurysms within the complex three-dimensional branching of the Circle of Willis remains expertise-dependent. Existing automated systems are constrained by class imbalance, skull-base artifacts that mimic vascular contrast, and reliance on global binary classification without structured localization, limiting surgical relevance and interpretability. We propose CMTF-Net, a cross-modal target fusion framework that reframes aneurysm screening as anatomically structured reasoning. By supervising 14 vascular territories independently, the network encodes Circle of Willis geometry while allowing multi-segment activation, aligning model design with clinical workflow. CMTF-Net achieves near-perfect AUC-ROC with narrow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
