CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical   Imaging

Raza Imam; Mohammed Talha Alam; Umaima Rahman; Mohsen Guizani; Fakhri; Karray

arXiv:2407.07315·cs.CV·November 22, 2024

CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging

Raza Imam, Mohammed Talha Alam, Umaima Rahman, Mohsen Guizani, Fakhri, Karray

PDF

Open Access

TL;DR

CosmoCLIP is a specialized astronomical image-text contrastive learning framework that fine-tunes a pre-trained CLIP model using SpaceNet and BLIP captions, achieving superior zero-shot performance in astronomical tasks.

Contribution

We introduce CosmoCLIP, a novel astronomical contrastive learning framework that leverages SpaceNet and BLIP captions to enhance generalization of vision-language models in astronomy.

Findings

01

Outperforms CLIP in zero-shot classification

02

Achieves superior image-text retrieval accuracy

03

Demonstrates strong generalization across tasks

Abstract

Existing vision-text contrastive learning models enhance representation transferability and support zero-shot prediction by matching paired image and caption embeddings while pushing unrelated pairs apart. However, astronomical image-label datasets are significantly smaller compared to general image and label datasets available from the internet. We introduce CosmoCLIP, an astronomical image-text contrastive learning framework precisely fine-tuned on the pre-trained CLIP model using SpaceNet and BLIP-based captions. SpaceNet, attained via FLARE, constitutes ~13k optimally distributed images, while BLIP acts as a rich knowledge extractor. The rich semantics derived from this SpaceNet and BLIP descriptions, when learned contrastively, enable CosmoCLIP to achieve superior generalization across various in-domain and out-of-domain tasks. Our results demonstrate that CosmoCLIP is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAstronomical Observations and Instrumentation

MethodsContrastive Language-Image Pre-training · BLIP: Bootstrapping Language-Image Pre-training · Contrastive Learning