Hallucination Detection for Grounded Instruction Generation

Lingjun Zhao; Khanh Nguyen; Hal Daum\'e III

arXiv:2310.15319·cs.CL·October 25, 2023·1 cites

Hallucination Detection for Grounded Instruction Generation

Lingjun Zhao, Khanh Nguyen, Hal Daum\'e III

PDF

Open Access

TL;DR

This paper presents a model that detects hallucinated references in instructions for navigation in simulated environments, improving accuracy over existing methods by fine-tuning a pre-trained image-text model with contrastive learning.

Contribution

It introduces a novel approach for hallucination detection in grounded instructions using a pre-trained model and contrastive fine-tuning, outperforming several baselines.

Findings

01

The proposed model outperforms baseline methods in hallucination detection.

02

Contrastive fine-tuning improves detection accuracy.

03

Pre-trained image-text models are effective for this task.

Abstract

We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a model pre-trained on a large corpus of image-text pairs, and fine-tuning it with a contrastive loss that separates correct instructions from instructions containing synthesized hallucinations. Our final model outperforms several baselines, including using word probability estimated by the instruction-generation model, and supervised models based on LSTM and Transformer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Tanh Activation · Adam · Label Smoothing · Position-Wise Feed-Forward Layer