FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with   Pre-trained Vision-Language Models

Wan Xu; Tianyu Huang; Tianyu Qu; Guanglei Yang; Yiwen Guo; Wangmeng; Zuo

arXiv:2312.17051·cs.CV·January 9, 2025·1 cites

FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng, Zuo

PDF

Open Access 2 Repos

TL;DR

This paper introduces FILP-3D, a novel framework that leverages pre-trained vision-language models with specialized modules to improve 3D few-shot class-incremental learning, effectively addressing domain gap and noise issues.

Contribution

The paper proposes the FILP-3D framework with RFE and SNC modules, and introduces new benchmarks and metrics for 3D FSCIL evaluation, advancing the state-of-the-art.

Findings

01

FILP-3D outperforms existing methods on benchmarks.

02

RFE effectively aligns feature spaces of 3D data and pre-trained models.

03

SNC captures robust geometric features in point clouds.

Abstract

Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. However, many of these works lack effective exploration of prior knowledge, rendering them unable to effectively address the domain gap issue in the context of 3D FSCIL, thereby leading to catastrophic forgetting. The Contrastive Vision-Language Pre-Training (CLIP) model serves as a highly suitable backbone for addressing the challenges of 3D FSCIL due to its abundant shape-related prior knowledge. Unfortunately, its direct application to 3D FSCIL still faces the incompatibility between 3D data representation and the 2D features, primarily manifested as feature space misalignment and significant noise. To address the above challenges, we introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Medical Imaging and Analysis

MethodsFocus · Rank Flow Embedding