AmCLR: Unified Augmented Learning for Cross-Modal Representations
Ajay Jagannath, Aayush Upadhyay, Anant Mehta

TL;DR
AmCLR and xAmCLR are novel contrastive learning frameworks for vision-language models that improve robustness and efficiency by integrating diverse augmentations and intra-modal alignments, reducing computational requirements.
Contribution
The paper introduces AmCLR and xAmCLR, new contrastive learning objectives that enhance bimodal representation learning with fewer resources and richer augmentations.
Findings
AmCLR achieves comparable performance with smaller batch sizes.
xAmCLR improves intra-modal alignment for better feature richness.
Both methods demonstrate increased robustness and efficiency.
Abstract
Contrastive learning has emerged as a pivotal framework for representation learning, underpinning advances in both unimodal and bimodal applications like SimCLR and CLIP. To address fundamental limitations like large batch size dependency and bimodality, methods such as SogCLR leverage stochastic optimization for the global contrastive objective. Inspired by SogCLR's efficiency and adaptability, we introduce AmCLR and xAmCLR objective functions tailored for bimodal vision-language models to further enhance the robustness of contrastive learning. AmCLR integrates diverse augmentations, including text paraphrasing and image transformations, to reinforce the alignment of contrastive representations, keeping batch size limited to a few hundred samples unlike CLIP which needs batch size of 32,768 to produce reasonable results. xAmCLR further extends this paradigm by incorporating intra-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Bitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Kaiming Initialization · Global Average Pooling · Max Pooling · Dense Connections · Convolution · Random Gaussian Blur · Color Jitter
