# GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions

**Authors:** Kei Katsumata, Yui Iioka, Naoki Hosomi, Teruhisa Misu, Kentaro Yamada, Komei Sugiura

arXiv: 2508.21102 · 2025-09-01

## TL;DR

GENNAV is a novel method that predicts existence and generates segmentation masks for multiple stuff-type target regions from natural language instructions and camera images, outperforming baselines in diverse real-world urban environments.

## Contribution

We introduce GENNAV, a new approach for identifying and segmenting multiple stuff-type target regions based on natural language and images, along with a new benchmark GRiN-Drive.

## Key findings

- GENNAV outperforms baseline methods on standard metrics.
- It demonstrates robust zero-shot transfer in real-world urban environments.
- The GRiN-Drive benchmark effectively evaluates multi-target segmentation tasks.

## Abstract

We focus on the task of identifying the location of target regions from a natural language instruction and a front camera image captured by a mobility. This task is challenging because it requires both existence prediction and segmentation, particularly for stuff-type target regions with ambiguous boundaries. Existing methods often underperform in handling stuff-type target regions, in addition to absent or multiple targets. To overcome these limitations, we propose GENNAV, which predicts target existence and generates segmentation masks for multiple stuff-type target regions. To evaluate GENNAV, we constructed a novel benchmark called GRiN-Drive, which includes three distinct types of samples: no-target, single-target, and multi-target. GENNAV achieved superior performance over baseline methods on standard evaluation metrics. Furthermore, we conducted real-world experiments with four automobiles operated in five geographically distinct urban areas to validate its zero-shot transfer performance. In these experiments, GENNAV outperformed baseline methods and demonstrated its robustness across diverse real-world environments. The project page is available at https://gennav.vercel.app/.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21102/full.md

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21102/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/2508.21102/full.md

---
Source: https://tomesphere.com/paper/2508.21102