Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation

Tanner Schmidt; Richard Newcombe

arXiv:2506.11131·cs.CV·June 16, 2025

Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation

Tanner Schmidt, Richard Newcombe

PDF

Open Access 1 Repo

TL;DR

Segment This Thing introduces a foveated tokenization approach for point-prompted segmentation, significantly reducing computational costs by focusing on regions of interest, enabling real-time performance on consumer hardware.

Contribution

The paper proposes a novel foveated patch tokenization method that improves efficiency in point-prompted segmentation without increasing model size.

Findings

01

Achieves higher efficiency than prior segmentation models

02

Runs at interactive frame rates on consumer hardware

03

Maintains competitive performance on segmentation benchmarks

Abstract

This paper presents Segment This Thing (STT), a new efficient image segmentation model designed to produce a single segment given a single point prompt. Instead of following prior work and increasing efficiency by decreasing model size, we gain efficiency by foveating input images. Given an image and a point prompt, we extract a crop centered on the prompt and apply a novel variable-resolution patch tokenization in which patches are downsampled at a rate that increases with increased distance from the prompt. This approach yields far fewer image tokens than uniform patch tokenization. As a result we can drastically reduce the computational cost of segmentation without reducing model size. Furthermore, the foveation focuses the model on the region of interest, a potentially useful inductive bias. We show that our Segment This Thing model is more efficient than prior work while remaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/segment_this_thing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques