# Semantic Segmentation and Effect Optimization of 3D Point Cloud Based on 2D Semantic Segmentation and Clustering for Construction Machinery Unstructured Environment

**Authors:** Shengjie Fu, Qipeng Cai, Zhongshen Li, Wentao Wang, Tianliang Lin, Qihuai Chen, Zhaoyuan Yao

PMC · DOI: 10.3390/s26041257 · 2026-02-14

## TL;DR

This paper introduces a cost-effective method for 3D object perception in construction environments using 2D image data and clustering techniques.

## Contribution

A weakly supervised 3D semantic perception framework that uses 2D image labels and clustering to avoid expensive 3D point cloud annotations.

## Key findings

- The method achieves a mean Pixel Accuracy (mPA) of 84.72% in 3D semantic perception.
- It attains a mean Intersection over Union (mIoU) of 75.85% in reconstructing target contours.
- Validation was performed using a custom unstructured scene dataset and real-world testing.

## Abstract

The operational environment of construction machinery is predominantly unstructured, characterized by rapid changes, high complexity, and irregularly distributed objects. This poses significant challenges for 3D semantic perception, particularly due to the high cost of acquiring point cloud semantic labels. To address this, a novel 3D semantic perception scheme is proposed for such unstructured environments. This scheme integrates image semantic segmentation results with point cloud clustering via perspective projection. The projection parameters are refined using Particle Swarm Optimization (PSO), and the semantic consistency of the fused results is further enhanced by a Kd-tree-based radius nearest neighbor (RNN) matching algorithm. Consequently, a weakly supervised framework is established that achieves accurate 3D semantic understanding using only 2D image labels, eliminating the need for annotated 3D point clouds. The feasibility and effectiveness of the scheme are validated through a dedicated unstructured scene dataset and real-world testing. Results demonstrate its capability to effectively perceive 3D semantic information and reconstruct target contours, achieving a mean Pixel Accuracy (mPA) of 84.72% and a mean Intersection over Union (mIoU) of 75.85%.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** PSO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12944190/full.md

---
Source: https://tomesphere.com/paper/PMC12944190