Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos

Zhiyi Shi; Junsik Kim; Helen Y. Yang; Yonghyun Song; Hyun-Jic Oh; Dalit Ben-Yosef; Daniel Needleman; Hanspeter Pfister

arXiv:2506.17403·cs.CV·June 24, 2025

Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos

Zhiyi Shi, Junsik Kim, Helen Y. Yang, Yonghyun Song, Hyun-Jic Oh, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister

PDF

TL;DR

This paper introduces Spatial-Temporal Pre-Training (STPT), a novel self-supervised learning approach designed to improve embryo viability prediction from long, variable-length time-lapse videos in IVF, addressing memory and alignment challenges.

Contribution

STPT is a two-stage pre-training method that efficiently handles long videos and temporal variability without requiring frame alignment, advancing embryo viability prediction.

Findings

01

Achieved highest AUC of 0.635 on 23,027 videos

02

Effectively handles long videos and temporal misalignment

03

Requires limited computational resources

Abstract

Automating embryo viability prediction for in vitro fertilization (IVF) is important but challenging due to the limited availability of labeled pregnancy outcome data, as only a small fraction of embryos are labeled after transfer. Self-supervised learning (SSL) can leverage both labeled and unlabeled data to improve prediction. However, existing SSL methods for videos are not directly applicable to embryo development videos due to two challenges: (1) embryo time-lapse videos contain hundreds of frames, requiring significant GPU memory for conventional SSL; (2) the dataset contains videos with varying lengths and many outlier frames, causing traditional video alignment methods to struggle with semantic misalignment. We propose Spatial-Temporal Pre-Training (STPT) to address these challenges. STPT includes two stages: spatial and temporal. In each stage, only one encoder is trained while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.