CLIPTime: Time-Aware Multimodal Representation Learning from Images and Text

Anju Rani; Daniel Ortiz-Arroyo; Petar Durdevic

arXiv:2508.00447·cs.CV·August 4, 2025

CLIPTime: Time-Aware Multimodal Representation Learning from Images and Text

Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic

PDF

Open Access

TL;DR

CLIPTime is a novel multimodal framework based on CLIP that predicts both developmental stages and timestamps of fungal growth from images and text, enabling time-aware biological analysis without explicit temporal data.

Contribution

This work introduces CLIPTime, a multitask model that jointly predicts growth stages and timestamps, and provides a synthetic dataset for training and evaluation in biological time-series analysis.

Findings

01

Effectively models biological progression.

02

Produces interpretable, temporally grounded outputs.

03

Outperforms baseline methods in time-aware prediction.

Abstract

Understanding the temporal dynamics of biological growth is critical across diverse fields such as microbiology, agriculture, and biodegradation research. Although vision-language models like Contrastive Language Image Pretraining (CLIP) have shown strong capabilities in joint visual-textual reasoning, their effectiveness in capturing temporal progression remains limited. To address this, we propose CLIPTime, a multimodal, multitask framework designed to predict both the developmental stage and the corresponding timestamp of fungal growth from image and text inputs. Built upon the CLIP architecture, our model learns joint visual-textual embeddings and enables time-aware inference without requiring explicit temporal input during testing. To facilitate training and evaluation, we introduce a synthetic fungal growth dataset annotated with aligned timestamps and categorical stage labels.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning