SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation

Chengxi Zeng; Yuxuan Jiang; Ge Gao; Shuai Wang; Duolikun Danier; Bin Zhu; Stevan Rudinac; David Bull; and Fan Zhang

arXiv:2602.12173·cs.AI·February 13, 2026

SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation

Chengxi Zeng, Yuxuan Jiang, Ge Gao, Shuai Wang, Duolikun Danier, Bin Zhu, Stevan Rudinac, David Bull, and Fan Zhang

PDF

Open Access 2 Models

TL;DR

This paper analyzes the redundancy in current vision-language segmentation text encoders and introduces SAM3-LiteText, a lightweight, distilled encoder that significantly reduces parameters while maintaining performance.

Contribution

It provides a large-scale anatomical analysis of text prompting in segmentation models and proposes a compact, efficient text encoder based on knowledge distillation.

Findings

01

Redundant usage of context windows and vocabulary in prompts.

02

Low-dimensional manifold structure in text embeddings.

03

Up to 88% reduction in text encoder parameters with maintained performance.

Abstract

Vision-language segmentation models such as SAM3 enable flexible, prompt-driven visual grounding, but inherit large, general-purpose text encoders originally designed for open-ended language understanding. In practice, segmentation prompts are short, structured, and semantically constrained, leading to substantial over-provisioning in text encoder capacity and persistent computational and memory overhead. In this paper, we perform a large-scale anatomical analysis of text prompting in vision-language segmentation, covering 404,796 real prompts across multiple benchmarks. Our analysis reveals severe redundancy: most context windows are underutilized, vocabulary usage is highly sparse, and text embeddings lie on low-dimensional manifold despite high-dimensional representations. Motivated by these findings, we propose SAM3-LiteText, a lightweight text encoding framework that replaces the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques