Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

Mehran Tamjidi; Hamidreza Dastmalchi; Mohammadreza Alimoradijazi; Ali Cheraghian; Aijun An; Morteza Saberi

arXiv:2511.15311·cs.CV·November 24, 2025

Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

Mehran Tamjidi, Hamidreza Dastmalchi, Mohammadreza Alimoradijazi, Ali Cheraghian, Aijun An, Morteza Saberi

PDF

Open Access

TL;DR

This paper introduces Uni-Adapter, a training-free online test-time adaptation method for 3D vision-language models that dynamically updates class prototypes to improve robustness against data distribution shifts without retraining.

Contribution

The paper presents a novel, training-free TTA strategy using dynamic prototype learning and graph-based label smoothing for 3D VLFMs, enhancing their practical robustness.

Findings

01

Achieves state-of-the-art results on multiple 3D benchmarks.

02

Improves ModelNet-40C accuracy by 10.55%.

03

Enhances ScanObjectNN-C accuracy by 8.26%.

Abstract

3D Vision-Language Foundation Models (VLFMs) have shown strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, these models often underperform in practical scenarios where data are noisy, incomplete, or drawn from a different distribution than the training data. To address this, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. We define a 3D cache to store class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability in heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation via similarity scoring. Simultaneously, a graph-based label smoothing module captures inter-prototype similarities to enforce label consistency among similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications