MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models
Sayak Chakrabarty, Souradip Pal

TL;DR
This paper presents MM-PoE, a novel multi-modal approach that enhances visual reasoning in models by mimicking human elimination strategies, significantly improving performance in visual question-answering tasks across multiple datasets.
Contribution
Introduces MM-PoE, a dual-step elimination method for multi-modal models, enabling improved zero-shot and few-shot visual reasoning beyond traditional independent option evaluation.
Findings
Significant performance gains in zero-shot and few-shot settings
Effective elimination of implausible options improves reasoning accuracy
Broadens application of process of elimination to multi-modal visual reasoning
Abstract
This paper introduces Multiple Choice Reasoning via. Process of Elimination using Multi-Modal models, herein referred to as Multi-Modal Process of Elimination (MM-PoE). This novel methodology is engineered to augment the efficacy of Vision-Language Models (VLMs) in multiple-choice visual reasoning tasks. Diverging from conventional approaches that evaluate each option independently, MM-PoE employs a dual-step scoring paradigm that initially identifies and excludes implausible choices, subsequently concentrating on the most probable remaining options. This method emulates human test-taking strategies, where individuals typically eliminate clearly incorrect answers prior to selecting the optimal response. Our empirical evaluations, conducted across three benchmark datasets, reveal that MM-PoE significantly improves both zero-shot and few-shot performance of contemporary state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making
