TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification

Darya Taratynova; Alya Almsouti; Beknur Kalmakhanbet; Numan Saeed; Mohammad Yaqub

arXiv:2508.15298·cs.CV·September 8, 2025

TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification

Darya Taratynova, Alya Almsouti, Beknur Kalmakhanbet, Numan Saeed, Mohammad Yaqub

PDF

TL;DR

This paper introduces TPA, a novel framework that leverages temporal modeling, prompt-aware contrastive learning, and uncertainty quantification to improve fetal CHD classification accuracy and calibration in ultrasound videos.

Contribution

The paper proposes TPA, combining foundation image-text models with temporal feature extraction and a new calibration module, advancing fetal CHD detection methods.

Findings

01

Achieves 85.40% macro F1 on private CHD dataset.

02

Reduces calibration error by 5.38%.

03

Improves F1 score by 4.73% on EchoNet-Dynamic.

Abstract

Congenital heart defect (CHD) detection in ultrasound videos is hindered by image noise and probe positioning variability. While automated methods can reduce operator dependence, current machine learning approaches often neglect temporal information, limit themselves to binary classification, and do not account for prediction calibration. We propose Temporal Prompt Alignment (TPA), a method leveraging foundation image-text model and prompt-aware contrastive learning to classify fetal CHD on cardiac ultrasound videos. TPA extracts features from each frame of video subclips using an image encoder, aggregates them with a trainable temporal extractor to capture heart motion, and aligns the video representation with class-specific text prompts via a margin-hinge contrastive loss. To enhance calibration for clinical reliability, we introduce a Conditional Variational Autoencoder Style…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.