Sparse Refinement for Efficient High-Resolution Semantic Segmentation

Zhijian Liu; Zhuoyang Zhang; Samir Khaki; Shang Yang; Haotian Tang,; Chenfeng Xu; Kurt Keutzer; Song Han

arXiv:2407.19014·cs.CV·July 30, 2024

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang,, Chenfeng Xu, Kurt Keutzer, Song Han

PDF

Open Access 3 Reviews

TL;DR

SparseRefine is a novel method that efficiently enhances low-resolution semantic segmentation with sparse high-resolution refinements, enabling faster processing of high-res images with minimal accuracy loss.

Contribution

It introduces a universal sparse refinement framework that improves high-resolution semantic segmentation efficiency across various models.

Findings

01

Achieves 1.5 to 3.7 times speedup on multiple models.

02

Maintains accuracy with negligible to no loss.

03

Applicable to CNN- and ViT-based models.

Abstract

Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefine, a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. Based on coarse low-resolution outputs, SparseRefine first uses an entropy selector to identify a sparse set of pixels with high entropy. It then employs a sparse feature extractor to efficiently generate the refinements for those pixels of interest. Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions. SparseRefine can be seamlessly…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

* The idea is simple and makes sense. The area to be refined is indeed sparse, and using sparse NN to the refined area makes sense and should improve the time-complexity. * I enjoyed the generality of the method. Because the method does not assume any restrictions on the segmentation architecture and only uses the segmentation logit, the method is applicable to any segmentation model. The segmentation model can be plug-and-play. * The experiments are well conducted. The authors show the genera

Weaknesses

* I’m not sure how the training data for the refinement was created. To train the refinement module, sparse high entropy pixels are required. How are the high entropy pixels acquired? Is it acquired from the pretrained segmentation architectures? Also, is the refinement model trained for each of the NN architectures in Table 1, or is it universal?

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

This work explores applying a sparse refinement on the interpolated coarse prediction, which uses an entropy selector to help to sparsely identify the erroneous regions, without the need to refine the prediction in a full image-size. Thus, this approach gives a reduction in computation during inference.

Weaknesses

1. I agree that the integration of multiple components into a feasible solution is a non-trivial task. However, the composition of such existing works implies that the proposed work lacks sufficient novelties. 2. Although the authors claim the proposed work provides a significant speedup in inference. However, a comparison in terms of a more persuasive metric, GFLOPS, is missing, which is independent of the machine speed and commonly used for measuring the inference efficiency of a network model

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

S1. The proposed method succeeds to improve the inference speed (1.5x - 2.0x) of popular heavy-weight models while keeping the mIoU performance. S2. Sparse feature extraction appears as a powerful and under-researched computer vision technique. S3. Simplicity of the method will likely lead to derivative future work. S4. I was really surprised that looking at sparse pixels with so little context could contribute that much to the final performance. S5. I was also surprised that showing low res

Weaknesses

W1. The three components of the solution (entropy-based uncertainty, Minkowski engine, weighted ensembes) have been proposed in the related work. W2. Proper validation of hyper-parameter \alpha has not been discussed (validating on test data is not acceptable), W3. Training the sparse feature extractor requires a lot of computational power (96 RTXA6000 days).

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training