Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks

Malsha Ashani Mahawatta Dona; Beatriz Cabrero-Daniel; Yinan Yu,; Christian Berger

arXiv:2408.01433·cs.CV·August 6, 2024

Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks

Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu,, Christian Berger

PDF

Open Access

TL;DR

This paper assesses hallucination detection strategies for multimodal LLMs in vehicle perception tasks, highlighting current limitations and proposing extensions to improve reliability in pedestrian detection.

Contribution

It systematically evaluates hallucination detection methods on state-of-the-art LLMs in automotive perception, introducing extensions that leverage temporal information to enhance detection accuracy.

Findings

01

Proprietary GPT-4V outperforms open LLaVA in pedestrian detection.

02

Current detection methods struggle with false negatives and localization.

03

Incorporating past information improves hallucination detection results.

Abstract

Today's advanced driver assistance systems (ADAS), like adaptive cruise control or rear collision warning, are finding broader adoption across vehicle classes. Integrating such advanced, multimodal Large Language Models (LLMs) on board a vehicle, which are capable of processing text, images, audio, and other data types, may have the potential to greatly enhance passenger comfort. Yet, an LLM's hallucinations are still a major challenge to be addressed. In this paper, we systematically assessed potential hallucination detection strategies for such LLMs in the context of object detection in vision-based data on the example of pedestrian detection and localization. We evaluate three hallucination detection strategies applied to two state-of-the-art LLMs, the proprietary GPT-4V and the open LLaVA, on two datasets (Waymo/US and PREPER CITY/Sweden). Our results show that these LLMs can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Data Security Solutions · Business Process Modeling and Analysis