FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models
Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng, Zuo

TL;DR
This paper introduces FILP-3D, a novel framework that leverages pre-trained vision-language models with specialized modules to improve 3D few-shot class-incremental learning, effectively addressing domain gap and noise issues.
Contribution
The paper proposes the FILP-3D framework with RFE and SNC modules, and introduces new benchmarks and metrics for 3D FSCIL evaluation, advancing the state-of-the-art.
Findings
FILP-3D outperforms existing methods on benchmarks.
RFE effectively aligns feature spaces of 3D data and pre-trained models.
SNC captures robust geometric features in point clouds.
Abstract
Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. However, many of these works lack effective exploration of prior knowledge, rendering them unable to effectively address the domain gap issue in the context of 3D FSCIL, thereby leading to catastrophic forgetting. The Contrastive Vision-Language Pre-Training (CLIP) model serves as a highly suitable backbone for addressing the challenges of 3D FSCIL due to its abundant shape-related prior knowledge. Unfortunately, its direct application to 3D FSCIL still faces the incompatibility between 3D data representation and the 2D features, primarily manifested as feature space misalignment and significant noise. To address the above challenges, we introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Medical Imaging and Analysis
MethodsFocus · Rank Flow Embedding
