Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

Yun Zhu; Jianjun Qian; Jian Yang; Jin Xie; Na Zhao

arXiv:2604.07997·cs.CV·April 10, 2026

Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

Yun Zhu, Jianjun Qian, Jian Yang, Jin Xie, Na Zhao

PDF

1 Repo

TL;DR

FI3Det is a novel framework for few-shot incremental 3D object detection in indoor environments, leveraging vision-language models to detect unseen categories with minimal samples.

Contribution

It introduces a VLM-guided unknown object learning module and a gated multimodal prototype imprinting approach for effective few-shot incremental detection.

Findings

01

FI3Det achieves strong improvements over baselines on ScanNet V2 and SUN RGB-D datasets.

02

The framework effectively mines unknown objects and constructs robust category prototypes.

03

It demonstrates the first approach for few-shot incremental 3D object detection.

Abstract

Incremental 3D object perception is a critical step toward embodied intelligence in dynamic indoor environments. However, existing incremental 3D detection methods rely on extensive annotations of novel classes for satisfactory performance. To address this limitation, we propose FI3Det, a Few-shot Incremental 3D Detection framework that enables efficient 3D perception with only a few novel samples by leveraging vision-language models (VLMs) to learn knowledge of unseen categories. FI3Det introduces a VLM-guided unknown object learning module in the base stage to enhance perception of unseen categories. Specifically, it employs VLMs to mine unknown objects and extract comprehensive representations, including 2D semantic features and class-agnostic 3D bounding boxes. To mitigate noise in these representations, a weighting mechanism is further designed to re-weight the contributions of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zyrant/FI3Det
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.