A Comprehensive Survey of Hallucination in Large Language, Image, Video   and Audio Foundation Models

Pranab Sahoo; Prabhash Meharia; Akash Ghosh; Sriparna Saha; Vinija; Jain; Aman Chadha

arXiv:2405.09589·cs.LG·October 4, 2024·2 cites

A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models

Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija, Jain, Aman Chadha

PDF

Open Access 1 Video

TL;DR

This survey reviews recent progress in understanding, detecting, and reducing hallucinations in large foundation models across multiple modalities, highlighting challenges and future directions for reliable AI systems.

Contribution

It provides a comprehensive framework and taxonomy for hallucination in multimodal foundation models, synthesizing recent detection and mitigation strategies.

Findings

01

Hallucination remains a major challenge across modalities.

02

Recent detection methods improve reliability of foundation models.

03

Mitigation strategies are emerging but need further development.

Abstract

The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models· underline

Taxonomy

TopicsDigital Media Forensic Detection · Aesthetic Perception and Analysis · Mental Health Research Topics