Loading paper
Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment | Tomesphere