# A label masked autoencoder for image-guided segmentation label completion

**Authors:** Jiaru Jia, Mingzhe Liu, Dongfen Li, Xin Chen, Ruili Wang, Linlin Zhuo, Keqin Li

PMC · DOI: 10.1016/j.patter.2025.101455 · 2025-12-22

## TL;DR

This paper introduces a method to automatically correct flawed image annotations, improving segmentation accuracy without requiring new human labeling.

## Contribution

The novel label masked autoencoder (L-MAE) improves segmentation by reconstructing incomplete or corrupted mask labels using image-label fusion.

## Key findings

- L-MAE improves average mean intersection over union (mIoU) by 4.1% through contextual inference.
- The method achieves 91.0% PA-mIoU on Pascal VOC 2012 and 86.4% on Cityscapes, outperforming existing supervised models.
- Training on a L-MAE-enhanced dataset yields a 13.5% mIoU improvement over degraded data.

## Abstract

Recent studies have demonstrated that high-quality annotated data are crucial for segmentation performance. However, incomplete or corrupted mask annotations remain common, limiting supervised learning. To address this, we introduce a mask-reconstruction task, referred to as masked segmentation label modeling (MSLM), which refines partially occluded labels by leveraging visible regions without manual annotations. We further propose the label masked autoencoder (L-MAE), which identifies erroneous regions and reconstructs them through contextual inference. The L-MAE fuses incomplete labels and corresponding images into a unified map for reconstruction, and an image patch supplement (IPS) algorithm restores missing image information, improving the average mean intersection over union (mIoU) by 4.1%. To validate the L-MAE, we train segmentation models on a degraded and L-MAE-enhanced Pascal VOC dataset, with the latter achieving a 13.5% mIoU improvement. The L-MAE attains predict area (PA)-mIoU scores of 91.0% on Pascal VOC 2012 and 86.4% on Cityscapes, outperforming state-of-the-art supervised segmentation models.

•Label masked autoencoder enhances incomplete mask labels for semantic segmentation•Multi-mask ratio inference generates labels with varying completeness for segmentation•Integrates image-label features and restores details to improve segmentation accuracy

Label masked autoencoder enhances incomplete mask labels for semantic segmentation

Multi-mask ratio inference generates labels with varying completeness for segmentation

Integrates image-label features and restores details to improve segmentation accuracy

Semantic segmentation is a process by which a computer assigns a label to each pixel in an image, helping identify, for example, a road, pedestrian, tree, or tumor. These machine learning methods are usually trained on annotated image datasets labeled by humans. Meticulous labeling by human annotators, however, is often slow and costly, and existing annotated datasets may have errors or other flaws that limit their usefulness. Re-annotating such image datasets is often prohibitively expensive. Here, we present a method that can be used to automatically correct defective annotations. Methods such as this one could reduce the time that humans spend on relabeling tasks and help advance the development of computer vision applications, especially ones that require precise image segmentation.

Many segmentation datasets carry gaps or noise in their annotations, which blunts model training. Here, the authors present a label-image fusion approach that learns to fill missing or corrupted regions. By turning imperfect labels into dependable supervision, it upgrades existing datasets and lifts accuracy without fresh hand labeling. The idea offers a simple, scalable approach to maintaining and expanding datasets across benchmarks and application domains.

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12998695/full.md

---
Source: https://tomesphere.com/paper/PMC12998695