# Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery

**Authors:** Kun Zhao, Juan Li, Shuai Xie, Lijian Zhou, Wenbin He, Xiaolin Chen

PMC · DOI: 10.3390/s25051504 · 2025-02-28

## TL;DR

This paper introduces a new self-supervised learning framework for identifying urban functional zones using street-view images, reducing the need for labeled data.

## Contribution

The novel Trilateral Redundancy Reduction (Tri-ReD) framework with trilateral loss and Tri-MExA augmentation improves self-supervised learning for urban scene classification.

## Key findings

- Tri-ReD outperforms direct supervised learning by 19% on average for urban functional zone identification.
- The framework surpasses ImageNet pre-trained models by around 11% in performance.
- Tri-ReD is architecture-agnostic and works effectively with both CNNs and ViTs.

## Abstract

In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks, an innovative self-supervised learning framework, Trilateral Redundancy Reduction (Tri-ReD) is proposed. In this framework, a more restrictive loss, “trilateral loss”, is proposed. By compelling the embedding of positive samples to be highly correlated, it guides the pre-trained model to learn more essential representations without semantic labels. Furthermore, a novel data augmentation strategy, tri-branch mutually exclusive augmentation (Tri-MExA), is proposed. Its aim is to reduce the uncertainties introduced by traditional random augmentation methods. As a model pre-training method, Tri-ReD framework is architecture-agnostic, performing effectively on both CNNs and ViTs, which makes it adaptable for a wide variety of downstream tasks. In this paper, 116,491 unlabeled street-view images were used to pre-train models by Tri-ReD to obtain the general representation of urban scenes at the ground level. These pre-trained models were then fine-tuned using supervised data with semantic labels (17,600 images from BIC_GSV and 12,871 from BEAUTY) for the final classification task. Experimental results demonstrate that the proposed self-supervised pre-training method outperformed the direct supervised learning approaches for urban functional zone identification by 19% on average. It also surpassed the performance of models pre-trained on ImageNet by around 11%, achieving state-of-the-art (SOTA) results in self-supervised pre-training.

## Full-text entities

- **Genes:** TRI-AAT9-1 (tRNA-Ile (anticodon AAT) 9-1) [NCBI Gene 7202] {aka TRI, TRNAI1}
- **Diseases:** LULC (MESH:D019966), occlusion (MESH:D001157), SSL (MESH:D007859), ReD. (MESH:C562718), SVIs (MESH:C564543), injury to (MESH:D014947), CSM (MESH:C538175), GIST (MESH:D046152)
- **Chemicals:** BYOL (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11902646/full.md

---
Source: https://tomesphere.com/paper/PMC11902646