Towards a performance analysis on pre-trained Visual Question Answering   models for autonomous driving

Kaavya Rekanar; Ciar\'an Eising; Ganesh Sistu; Martin Hayes

arXiv:2307.09329·cs.CV·July 31, 2023·1 cites

Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving

Kaavya Rekanar, Ciar\'an Eising, Ganesh Sistu, Martin Hayes

PDF

Open Access 1 Repo

TL;DR

This paper provides an initial performance analysis of three pre-trained VQA models in autonomous driving scenarios, focusing on their response similarity to expert answers and the impact of multimodal architecture features.

Contribution

It introduces a preliminary evaluation of ViLBERT, ViLT, and LXMERT models for driving-related VQA tasks, highlighting the influence of cross-modal attention and fusion techniques.

Findings

01

Models with cross-modal attention perform better.

02

Late fusion techniques show promising results.

03

Analysis sets the stage for comprehensive future studies.

Abstract

This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context of answering questions relating to driving scenarios. The performance of these models is evaluated by comparing the similarity of responses to reference answers provided by computer vision experts. Model selection is predicated on the analysis of transformer utilization in multimodal architectures. The results indicate that models incorporating cross-modal attention and late fusion techniques exhibit promising potential for generating improved answers within a driving perspective. This initial analysis serves as a launchpad for a forthcoming comprehensive comparative study involving nine VQA models and sets the scene for further investigations into the effectiveness of VQA model queries in self-driving scenarios. Supplementary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaavyarekanar/towards-a-performance-analysis-on-pre-trained-vqa-models-for-autonomous-driving
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsVision-and-Language BERT · Learning Cross-Modality Encoder Representations from Transformers