Enhancing Semantics in Multimodal Chain of Thought via Soft Negative   Sampling

Guangmin Zheng; Jin Wang; Xiaobing Zhou; Xuejie Zhang

arXiv:2405.09848·cs.CL·May 17, 2024

Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

Guangmin Zheng, Jin Wang, Xiaobing Zhou, Xuejie Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a soft negative sampling technique with bidirectional margin loss to reduce hallucinations in multimodal chain of thought reasoning, improving answer accuracy on complex reasoning tasks.

Contribution

It proposes a novel rationale generation method using soft negative sampling and bidirectional margin loss to mitigate hallucinations in multimodal CoT reasoning.

Findings

01

Improved answer accuracy on ScienceQA dataset

02

Effective reduction of hallucinations in multimodal reasoning

03

Enhanced rationale quality with soft negative samples

Abstract

Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgmin/snse-cot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Advanced Text Analysis Techniques

MethodsContrastive Learning