Hallucination Mitigation Prompts Long-term Video Understanding

Yiwei Sun; Zhihang Liu; Chuanbin Liu; Bowei Pu; Zhihan Zhang; Hongtao; Xie

arXiv:2406.11333·cs.CV·June 18, 2024

Hallucination Mitigation Prompts Long-term Video Understanding

Yiwei Sun, Zhihang Liu, Chuanbin Liu, Bowei Pu, Zhihan Zhang, Hongtao, Xie

PDF

Open Access

TL;DR

This paper introduces a hallucination mitigation pipeline for long video understanding using multimodal large language models, improving accuracy and reducing hallucinations in long video question answering tasks.

Contribution

It presents a novel pipeline combining frame sampling, question-guided visual feature extraction, and answer generation techniques to mitigate hallucinations in long video understanding.

Findings

01

Achieved 84.2% accuracy on MovieChat dataset

02

Surpassed baseline by 29.1% in global mode

03

Won third place in CVPR LOVEU 2024 challenge

Abstract

Recently, multimodal large language models have made significant advancements in video understanding tasks. However, their ability to understand unprocessed long videos is very limited, primarily due to the difficulty in supporting the enormous memory overhead. Although existing methods achieve a balance between memory and information by aggregating frames, they inevitably introduce the severe hallucination issue. To address this issue, this paper constructs a comprehensive hallucination mitigation pipeline based on existing MLLMs. Specifically, we use the CLIP Score to guide the frame sampling process with questions, selecting key frames relevant to the question. Then, We inject question information into the queries of the image Q-former to obtain more important visual features. Finally, during the answer generation stage, we utilize chain-of-thought and in-context learning techniques…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychedelics and Drug Studies · Hallucinations in medical conditions · Psychosomatic Disorders and Their Treatments

MethodsContrastive Language-Image Pre-training