AmCLR: Unified Augmented Learning for Cross-Modal Representations

Ajay Jagannath; Aayush Upadhyay; Anant Mehta

arXiv:2412.07979·cs.LG·December 12, 2024

AmCLR: Unified Augmented Learning for Cross-Modal Representations

Ajay Jagannath, Aayush Upadhyay, Anant Mehta

PDF

Open Access 1 Repo

TL;DR

AmCLR and xAmCLR are novel contrastive learning frameworks for vision-language models that improve robustness and efficiency by integrating diverse augmentations and intra-modal alignments, reducing computational requirements.

Contribution

The paper introduces AmCLR and xAmCLR, new contrastive learning objectives that enhance bimodal representation learning with fewer resources and richer augmentations.

Findings

01

AmCLR achieves comparable performance with smaller batch sizes.

02

xAmCLR improves intra-modal alignment for better feature richness.

03

Both methods demonstrate increased robustness and efficiency.

Abstract

Contrastive learning has emerged as a pivotal framework for representation learning, underpinning advances in both unimodal and bimodal applications like SimCLR and CLIP. To address fundamental limitations like large batch size dependency and bimodality, methods such as SogCLR leverage stochastic optimization for the global contrastive objective. Inspired by SogCLR's efficiency and adaptability, we introduce AmCLR and xAmCLR objective functions tailored for bimodal vision-language models to further enhance the robustness of contrastive learning. AmCLR integrates diverse augmentations, including text paraphrasing and image transformations, to reinforce the alignment of contrastive representations, keeping batch size limited to a few hundred samples unlike CLIP which needs batch size of 32,768 to produce reasonable results. xAmCLR further extends this paradigm by incorporating intra-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aaupadhy/AmCLR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Bitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Kaiming Initialization · Global Average Pooling · Max Pooling · Dense Connections · Convolution · Random Gaussian Blur · Color Jitter