Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos
Zhiyi Shi, Junsik Kim, Helen Y. Yang, Yonghyun Song, Hyun-Jic Oh, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister

TL;DR
This paper introduces Spatial-Temporal Pre-Training (STPT), a novel self-supervised learning approach designed to improve embryo viability prediction from long, variable-length time-lapse videos in IVF, addressing memory and alignment challenges.
Contribution
STPT is a two-stage pre-training method that efficiently handles long videos and temporal variability without requiring frame alignment, advancing embryo viability prediction.
Findings
Achieved highest AUC of 0.635 on 23,027 videos
Effectively handles long videos and temporal misalignment
Requires limited computational resources
Abstract
Automating embryo viability prediction for in vitro fertilization (IVF) is important but challenging due to the limited availability of labeled pregnancy outcome data, as only a small fraction of embryos are labeled after transfer. Self-supervised learning (SSL) can leverage both labeled and unlabeled data to improve prediction. However, existing SSL methods for videos are not directly applicable to embryo development videos due to two challenges: (1) embryo time-lapse videos contain hundreds of frames, requiring significant GPU memory for conventional SSL; (2) the dataset contains videos with varying lengths and many outlier frames, causing traditional video alignment methods to struggle with semantic misalignment. We propose Spatial-Temporal Pre-Training (STPT) to address these challenges. STPT includes two stages: spatial and temporal. In each stage, only one encoder is trained while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
