RI-Mamba: Rotation-Invariant Mamba for Robust Text-to-Shape Retrieval
Khanh Nguyen, Dasith de Silva Edirimuni, Ghulam Mubashar Hassan, Ajmal Mian

TL;DR
RI-Mamba is a novel rotation-invariant model for text-to-shape retrieval that effectively handles diverse object categories and orientations, significantly improving robustness and accuracy in large-scale 3D asset repositories.
Contribution
It introduces a rotation-invariant state-space model with Hilbert sorting and a new embedding strategy, enabling robust retrieval across diverse and arbitrarily oriented objects.
Findings
Achieves state-of-the-art performance on OmniObject3D benchmark.
Demonstrates robustness to arbitrary object orientations.
Operates efficiently with linear time complexity.
Abstract
3D assets have rapidly expanded in quantity and diversity due to the growing popularity of virtual reality and gaming. As a result, text-to-shape retrieval has become essential in facilitating intuitive search within large repositories. However, existing methods require canonical poses and support few object categories, limiting their real-world applicability where objects can belong to diverse classes and appear in random orientations. To address this challenge, we propose RI-Mamba, the first rotation-invariant state-space model for point clouds. RI-Mamba defines global and local reference frames to disentangle pose from geometry and uses Hilbert sorting to construct token sequences with meaningful geometric structure while maintaining rotation invariance. We further introduce a novel strategy to compute orientational embeddings and reintegrate them via feature-wise linear modulation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
