TL;DR
This paper introduces DAC, a novel framework for 2D-3D cross-modal retrieval that effectively handles noisy labels through adaptive sample division and self-correction, significantly improving performance on challenging real-world datasets.
Contribution
The paper proposes a divide-and-conquer approach with adaptive credibility modeling and self-correction strategies to enhance 2D-3D retrieval under noisy label conditions.
Findings
DAC outperforms state-of-the-art models by large margins.
Introduces a new noisy benchmark Objaverse-N200 with 200k samples.
Achieves +5.9% on ModelNet40 and +5.8% on Objaverse-N200.
Abstract
With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity to the threshold value. Besides, they fail to fully utilize the valuable supervisory signals within each divided subset. To tackle this problem, we propose a Divide-and-conquer 2D-3D cross-modal Alignment and Correction framework (DAC), which comprises Multimodal Dynamic Division (MDD) and Adaptive Alignment and Correction (AAC). Specifically, the former performs accurate sample division by adaptive credibility modeling for each sample based on the compensation information within multimodal loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Dynamic Algorithm Configuration
