REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders

Savya Khosla; Sethuraman TV; Barnett Lee; Alexander Schwing; Derek Hoiem

arXiv:2505.18153·cs.CV·November 4, 2025

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders

Savya Khosla, Sethuraman TV, Barnett Lee, Alexander Schwing, Derek Hoiem

PDF

1 Repo 1 Video

TL;DR

REN is a novel region encoding method that significantly speeds up region representation generation from patch-based image encoders, improves quality, and outperforms existing methods in segmentation and retrieval tasks.

Contribution

REN introduces a lightweight module for direct region token generation, bypassing segmentation, achieving 60x faster processing with less memory and better quality.

Findings

01

Outperforms original encoders in segmentation and retrieval

02

Achieves state-of-the-art on Ego4D VQ2D benchmark

03

Faster and more memory-efficient than prior methods

Abstract

We introduce the Region Encoder Network (REN), a fast and effective model for generating region-based image representations using point prompts. Recent methods combine class-agnostic segmenters (e.g., SAM) with patch-based image encoders (e.g., DINO) to produce compact and effective region representations, but they suffer from high computational cost due to the segmentation step. REN bypasses this bottleneck using a lightweight module that directly generates region tokens, enabling 60x faster token generation with 35x less memory, while also improving token quality. It uses a few cross-attention blocks that take point prompts as queries and features from a patch-based image encoder as keys and values to produce region tokens that correspond to the prompted objects. We train REN with three popular encoders-DINO, DINOv2, and OpenCLIP-and show that it can be extended to other encoders…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

savya08/ren
pytorchOfficial

Videos

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders· slideslive