TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification
Darya Taratynova, Alya Almsouti, Beknur Kalmakhanbet, Numan Saeed, Mohammad Yaqub

TL;DR
This paper introduces TPA, a novel framework that leverages temporal modeling, prompt-aware contrastive learning, and uncertainty quantification to improve fetal CHD classification accuracy and calibration in ultrasound videos.
Contribution
The paper proposes TPA, combining foundation image-text models with temporal feature extraction and a new calibration module, advancing fetal CHD detection methods.
Findings
Achieves 85.40% macro F1 on private CHD dataset.
Reduces calibration error by 5.38%.
Improves F1 score by 4.73% on EchoNet-Dynamic.
Abstract
Congenital heart defect (CHD) detection in ultrasound videos is hindered by image noise and probe positioning variability. While automated methods can reduce operator dependence, current machine learning approaches often neglect temporal information, limit themselves to binary classification, and do not account for prediction calibration. We propose Temporal Prompt Alignment (TPA), a method leveraging foundation image-text model and prompt-aware contrastive learning to classify fetal CHD on cardiac ultrasound videos. TPA extracts features from each frame of video subclips using an image encoder, aggregates them with a trainable temporal extractor to capture heart motion, and aligns the video representation with class-specific text prompts via a margin-hinge contrastive loss. To enhance calibration for clinical reliability, we introduce a Conditional Variational Autoencoder Style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
