Commands 4 Autonomous Vehicles (C4AV) Workshop Summary

Thierry Deruyttere; Simon Vandenhende; Dusan Grujicic; Yu Liu; Luc Van; Gool; Matthew Blaschko; Tinne Tuytelaars; Marie-Francine Moens

arXiv:2009.08792·cs.CV·September 21, 2020

Commands 4 Autonomous Vehicles (C4AV) Workshop Summary

Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Yu Liu, Luc Van, Gool, Matthew Blaschko, Tinne Tuytelaars, Marie-Francine Moens

PDF

TL;DR

This paper introduces a new challenge for visual grounding in autonomous vehicles using natural language commands, based on the Talk2Car dataset, and analyzes model performance and failure cases.

Contribution

It presents the C4AV challenge, compares it with existing datasets, and analyzes what makes models successful or prone to failure in this autonomous vehicle context.

Findings

01

Benchmark differs from existing datasets in scope and complexity.

02

Successful models leverage specific visual and linguistic features.

03

Failure cases reveal limitations in current visual grounding approaches.

Abstract

The task of visual grounding requires locating the most relevant region or object in an image, given a natural language query. So far, progress on this task was mostly measured on curated datasets, which are not always representative of human spoken language. In this work, we deviate from recent, popular task settings and consider the problem under an autonomous vehicle scenario. In particular, we consider a situation where passengers can give free-form natural language commands to a vehicle which can be associated with an object in the street scene. To stimulate research on this topic, we have organized the \emph{Commands for Autonomous Vehicles} (C4AV) challenge based on the recent \emph{Talk2Car} dataset (URL: https://www.aicrowd.com/challenges/eccv-2020-commands-4-autonomous-vehicles). This paper presents the results of the challenge. First, we compare the used benchmark against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.