Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces
Pratham Yashwante, Rose Yu

TL;DR
This paper investigates the alignment of time series, vision, and language representations in contrastive spaces, revealing asymmetries and the influence of model size and data richness on multimodal alignment.
Contribution
It extends the Platonic Representation Hypothesis to include time series and analyzes the geometric and scaling properties of contrastive alignment across modalities.
Findings
Time series and vision/language encoders are initially orthogonal without coupling.
Alignment improves with model size but is asymmetric, favoring vision over language.
Richer textual descriptions enhance alignment only up to a certain point.
Abstract
The Platonic Representation Hypothesis posits that learned representations from models trained on different modalities converge to a shared latent structure of the world. However, this hypothesis has largely been examined in vision and language, and it remains unclear whether time series participate in such convergence. We first examine this in a trimodal setting and find that independently pretrained time series, vision, and language encoders exhibit near-orthogonal geometry in the absence of explicit coupling. We then apply post-hoc alignment by training projection heads over frozen encoders using contrastive learning, and analyze the resulting representations with respect to geometry, scaling behavior, and dependence on information density and input modality characteristics. Our investigation reveals that overall alignment in contrastive representation spaces improves with model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Action Observation and Synchronization · Embodied and Extended Cognition
