Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment
Hua Ye (1, 2), Hang Ding (3), Siyuan Chen (4), Yiyang Jiang (5), Changyuan Zhang (6), Xuan Zhang (2, 7) ((1) Nanjing University, (2) Airon Technology CO. LTD, (3) University of Bristol, (4) The Hong Kong Polytechnic University, (5) Shanghai Jiao Tong University

TL;DR
This paper introduces BACL, a boundary-aware curriculum learning method for multimodal alignment that improves model performance by effectively handling ambiguous negatives, achieving state-of-the-art results without additional labels.
Contribution
The paper proposes a novel Boundary-Aware Curriculum with Local Attention (BACL) that enhances multimodal alignment by focusing on borderline negatives and highlighting mismatch regions.
Findings
Up to +32% R@1 over CLIP
Achieves new SOTA on four benchmarks
No extra labels needed
Abstract
Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose Boundary-Aware Curriculum with Local Attention (BACL), a lightweight add-on that turns these borderline cases into a curriculum signal. A Boundary-aware Negative Sampler gradually raises difficulty, while a Contrastive Local Attention loss highlights where the mismatch occurs. The two modules are fully differentiable and work with any off-the-shelf dual encoder. Theory predicts a fast O(1/n) error rate; practice shows up to +32% R@1 over CLIP and new SOTA on four large-scale benchmarks, all without extra labels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
