TL;DR
UniCon introduces a kernel-based framework for contrastive alignment that replaces slow stochastic optimization with exact, efficient updates, unifying linear and nonlinear encoders across various alignment tasks.
Contribution
It proposes a novel kernelized approach that enables closed-form solutions for contrastive alignment, significantly improving training efficiency while maintaining performance.
Findings
Achieves substantial efficiency gains over traditional methods.
Unifies contrastive alignment across different encoder types and alignment modes.
Demonstrates strong empirical performance on diverse tasks.
Abstract
Contrastive objectives power state-of-the-art multimodal models, but their training remains slow, relying on long stochastic optimization. We propose a Unified Framework for Efficient Contrastive Alignment via Kernels (UniCon), which spans linear and nonlinear encoders as well as one-to-one and many-to-many alignments. At its core, UniCon introduces the contrastive similarity weight matrix , which enables closed-form global solutions that provably replace minibatch back-propagation with exact updates. Through the lens of reproducing kernel Hilbert spaces (RKHS), UniCon provides a kernelized perspective that unifies contrastive alignment and reveals its connection to spectral methods. To validate the theory, we conduct experiments on synthetic, unimodal, multimodal, and zero-shot tasks, demonstrating that UniCon achieves substantial efficiency gains while preserving generality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
