DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda, Yuiga Wada, Komei Sugiura

TL;DR
DENEB is a new automatic evaluation metric for image captioning that is specifically designed to be robust against hallucinations, utilizing a novel transformer-based approach and trained on a large, diverse dataset.
Contribution
We introduce DENEB, a supervised evaluation metric that effectively handles hallucinations in image captioning by processing multiple references simultaneously with the Sim-Vec Transformer.
Findings
DENEB outperforms existing metrics on multiple datasets.
DENEB demonstrates robustness against hallucinations.
The Nebula dataset supports effective training of DENEB.
Abstract
In this work, we address the challenge of developing automatic evaluation metrics for image captioning, with a particular focus on robustness against hallucinations. Existing metrics are often inadequate for handling hallucinations, primarily due to their limited ability to compare candidate captions with multifaceted reference captions. To address this shortcoming, we propose DENEB, a novel supervised automatic evaluation metric specifically robust against hallucinations. DENEB incorporates the Sim-Vec Transformer, a mechanism that processes multiple references simultaneously, thereby efficiently capturing the similarity between an image, a candidate caption, and reference captions. To train DENEB, we construct the diverse and balanced Nebula dataset comprising 32,978 images, paired with human judgments provided by 805 annotators. We demonstrated that DENEB achieves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Retinal Imaging and Analysis · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding
