Learning to Correction: Explainable Feedback Generation for Visual   Commonsense Reasoning Distractor

Jiali Chen; Xusen Hei; Yuqi Xue; Yuancheng Wei; Jiayuan Xie; Yi Cai,; Qing Li

arXiv:2412.07801·cs.CV·December 12, 2024

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor

Jiali Chen, Xusen Hei, Yuqi Xue, Yuancheng Wei, Jiayuan Xie, Yi Cai,, Qing Li

PDF

1 Repo

TL;DR

This paper introduces a new benchmark and a model for large multimodal models to generate explainable feedback for correcting visual commonsense reasoning errors, inspired by human teaching methods.

Contribution

It pioneers the simulation of error correction in LMMs for VCR, introduces the VCR-DF dataset, and proposes the PEIFG model with expert prompts for improved feedback generation.

Findings

01

PEIFG outperforms existing LMMs in feedback quality

02

VCR-DF serves as a new benchmark for error correction in VCR

03

The approach enhances LMMs' ability to identify misconceptions

Abstract

Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task, which aims to answer a multiple-choice question based on visual commonsense within an image. However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored. Drawing inspiration from how a human teacher crafts challenging distractors to test students' comprehension of the concepts or skills and assists them in identifying and correcting errors toward the answer, we are the pioneering research for LMMs to simulate this error correction process. To this end, we employ GPT-4 as a ``teacher'' to collect the explainable feedback dataset VCR-DF for error correction, which serves as a benchmark to evaluate the ability of LMMs to identify misconceptions and clarify reasons behind the error in VCR distractors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gary-code/peifg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing