Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
Chunshu Wu, Ruibing Song, Sushant Kondguli, Tong Geng, Ang Li

TL;DR
This paper introduces Masked Binary U-Net, a hardware-efficient binary neural network for real-time high-resolution image segmentation that maintains near full-precision accuracy while significantly improving speed and energy efficiency on GPUs.
Contribution
The paper proposes a novel masked binary U-Net architecture and a GPU implementation that together achieve high accuracy and efficiency for real-time image segmentation on resource-constrained devices.
Findings
Near full-precision accuracy with only 3% average accuracy drop.
Over 2x speedup compared to 16-bit floating point U-Net.
More than 3x energy reduction on GPUs.
Abstract
Real-time image segmentation is a key enabler for AR/VR, robotics, drones, and autonomous systems, where tight accuracy, latency, and energy budgets must be met on resource-constrained edge devices. While U-Net offers a favorable balance of accuracy and efficiency compared to large transformer-based models, achieving real-time performance on high-resolution input remains challenging due to compute, memory, and power limits. Extreme quantization, particularly binary networks, is appealing for its hardware-friendly operations. However, two obstacles limit practicality: (1) severe accuracy degradation, and (2) a lack of end-to-end implementations that deliver efficiency on general-purpose GPUs. We make two empirical observations that guide our design. (1) An explicit zero state is essential: training with zero masking to binary U-Net weights yields noticeable sparsity. (2) Quantization…
Peer Reviews
Decision·ICLR 2026 Poster
* This paper is well organized and easy to follow, especially for readers who are not familiar with this area. * The insights that introducing large amount of ‘zero state’ into pure binary U-Nets is extremely helpful on segmentation task is great * A major strength of this work is its practical GPU execution framework, which provides tangible, measurable speedup on widely available NVIDIA GPUs * Innovatively unlocks Hardware Potential with “Subtractive Bit-Encoding”, extending the BMMA (Binary m
* Introducing ‘zero state’ into pure binary U-Net can significantly boost performance on segmentation, the performance of such a method applied on other types of networks and tasks remains unclear. However, the paper’s innovation on the low-level hardware implementation is still very solid.
S1 - Interesting finding that including zero-mask values seems to preserve the performance of the UNet model. Moreover, an interesting tensor-core deployment scheme was shown. S2 - The proposed system performs roughly just as well as the full-precision UNet’s while being much faster.
W1 - While INT8 and INT4 models are evaluated for performance (in 4.3), their latency / speed / energy is not shown. Could it be that INT8 and INT4 perform just as well as the proposed method in terms of speed? W2- Authors should’ve compared to another simple baseline, namely using TensorRT for inference optimization / quantization as in https://arxiv.org/pdf/2012.12259. W3 - The paper is lacking in experimental results. For example, Section 4.4 summarizes insights already gathered in 4.3 an
1. The paper addresses real-time, high-resolution segmentation on edge devices, focusing on accuracy, latency, and energy. The empirical insights are straightforward and motivate a practical design, making the contribution both coherent and relevant. 2. The work combines masked binary weights with a cost-aware layer selection and a GPU execution framework using Tensor Cores. The subtractive bit-encoding and native binary operations show strong engineering rigor and enable deployable efficiency g
1. The method adds ternary weights to selected layers of a binary U-Net to approach full-precision accuracy with near-binary efficiency. However, the paper does not clearly compare against ternary quantization baselines. Could the authors clarify in which dimensions MBU-Net outperforms classical ternary methods? 2. The paper enhances binary networks by masking some weights to zero. Intuitively, This seems related to sparsity/pruning. Could the authors clarify: can this method be considered as a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
