# Cross-view Semantic Segmentation for Sensing Surroundings

**Authors:** Bowen Pan, Jiankai Sun, Ho Yin Tiga Leung, Alex Andonian, Bolei Zhou

arXiv: 1906.03560 · 2020-07-28

## TL;DR

This paper introduces Cross-view Semantic Segmentation and the View Parsing Network (VPN) to generate top-down semantic maps from first-view observations, enabling robots to perceive surroundings effectively using domain adaptation from synthetic to real data.

## Contribution

The paper proposes a novel task and framework for cross-view semantic segmentation, utilizing domain adaptation to transfer from synthetic to real-world data.

## Key findings

- VPN effectively generates top-down semantic maps from 2D images.
- The model successfully transfers from synthetic to real-world environments.
- Experimental results demonstrate improved spatial understanding for robotic perception.

## Abstract

Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at \url{https://view-parsing-network.github.io}.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.03560/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1906.03560/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1906.03560/full.md

---
Source: https://tomesphere.com/paper/1906.03560