CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Yulei Qin, Xingyu Chen, Yunhang Shen, Chaoyou Fu, Yun Gu, Ke Li, Xing, Sun, Rongrong Ji

TL;DR
CAPro introduces a cross-modality aligned prototypes framework that leverages both web images and texts to improve visual representation learning amidst noisy web data, achieving state-of-the-art results.
Contribution
The paper proposes a novel unified contrastive learning framework that uses textual and visual prototypes to handle noise and improve semantic alignment in webly supervised learning.
Findings
Outperforms existing methods on WebVision1k and NUS-WIDE datasets.
Effectively handles label noise in both single-label and multi-label scenarios.
Demonstrates robustness to open-set recognition.
Abstract
Webly supervised learning has attracted increasing attention for its effectiveness in exploring publicly accessible data at scale without manual annotation. However, most existing methods of learning with web datasets are faced with challenges from label noise, and they have limited assumptions on clean samples under various noise. For instance, web images retrieved with queries of tiger cat (a cat species) and drumstick (a musical instrument) are almost dominated by images of tigers and chickens, which exacerbates the challenge of fine-grained visual concept learning. In this case, exploiting both web images and their associated texts is a requisite solution to combat real-world noise. In this paper, we propose Cross-modality Aligned Prototypes (CAPro), a unified prototypical contrastive learning framework to learn visual representations with correct semantics. For one thing, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Text and Document Classification Technologies
