Structure-Aware Residual-Center Representation for Self-Supervised   Open-Set 3D Cross-Modal Retrieval

Yang Xu; Yifan Feng; and Yu Jiang

arXiv:2407.15376·cs.MM·July 23, 2024

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

Yang Xu, Yifan Feng, and Yu Jiang

PDF

Open Access

TL;DR

This paper introduces a novel self-supervised framework for open-set 3D cross-modal retrieval that effectively handles unseen categories by leveraging hierarchical structure learning and residual-center embeddings.

Contribution

The proposed SRCR framework employs Residual-Center Embedding and Hierarchical Structure Learning to improve open-set 3D cross-modal retrieval performance without relying on category priors.

Findings

01

Outperforms state-of-the-art methods on four benchmarks

02

Effectively handles unseen categories in open-set environments

03

Demonstrates robustness through extensive ablation studies

Abstract

Existing methods of 3D cross-modal retrieval heavily lean on category distribution priors within the training set, which diminishes their efficacy when tasked with unseen categories under open-set environments. To tackle this problem, we propose the Structure-Aware Residual-Center Representation (SRCR) framework for self-supervised open-set 3D cross-modal retrieval. To address the center deviation due to category distribution differences, we utilize the Residual-Center Embedding (RCE) for each object by nested auto-encoders, rather than directly mapping them to the modality or category centers. Besides, we perform the Hierarchical Structure Learning (HSL) approach to leverage the high-order correlations among objects for generalization, by constructing a heterogeneous hypergraph structure based on hierarchical inter-modality, intra-object, and implicit-category correlations. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction