HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation

Weihuang Lin; Yiwei Ma; Xiaoshuai Sun; Shuting He; Jiayi Ji; Liujuan Cao; Rongrong Ji

arXiv:2507.12883·cs.CV·August 14, 2025

HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation

Weihuang Lin, Yiwei Ma, Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji

PDF

TL;DR

HRSeg introduces a high-resolution perception and enhancement framework for reasoning segmentation, significantly improving the accuracy of object segmentation based on implicit instructions by leveraging detailed visual information.

Contribution

The paper proposes HRSeg, a novel high-resolution perception and enhancement model that effectively processes and refines visual features for reasoning segmentation tasks.

Findings

01

HRSeg outperforms existing methods on multiple benchmarks.

02

High-resolution modules improve segmentation accuracy.

03

Efficient high-resolution processing reduces computational costs.

Abstract

The reasoning segmentation task involves segmenting objects within an image by interpreting implicit user instructions, which may encompass subtleties such as contextual cues and open-world knowledge. Despite significant advancements made by existing approaches, they remain constrained by low perceptual resolution, as visual encoders are typically pre-trained at lower resolutions. Furthermore, simply interpolating the positional embeddings of visual encoders to enhance perceptual resolution yields only marginal performance improvements while incurring substantial computational costs. To address this, we propose HRSeg, an efficient model with high-resolution fine-grained perception. It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE). The HRP module processes high-resolution images through cropping, integrating local and global features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.