Improving Generalization in Visual Reasoning via Self-Ensemble

Tien-Huy Nguyen; Quang-Khai Tran; Anh-Tuan Quang-Hoang

arXiv:2410.20883·cs.CV·November 4, 2024

Improving Generalization in Visual Reasoning via Self-Ensemble

Tien-Huy Nguyen, Quang-Khai Tran, Anh-Tuan Quang-Hoang

PDF

Open Access

TL;DR

This paper introduces a training-free self-ensemble method that enhances the generalization and reasoning abilities of large vision-language models without additional training, achieving state-of-the-art results on multiple visual reasoning benchmarks.

Contribution

The paper presents a novel self-ensemble technique that leverages internal model capabilities to improve visual reasoning without parameter updates.

Findings

01

Achieves SOTA performance on SketchyVQA

02

Improves out-of-distribution VQA accuracy

03

Enhances model generalization without additional training

Abstract

The cognitive faculty of visual reasoning necessitates the integration of multimodal perceptual processing and commonsense and external knowledge of the world. In recent years, a plethora of large vision-language models (LVLMs) have been proposed, demonstrating outstanding power and exceptional proficiency in commonsense reasoning across diverse domains and tasks. Nevertheless, training such LVLMs requires a lot of costly resources. Recent approaches, instead of training LVLMs from scratch on various large datasets, focus on exploring ways to take advantage of the capabilities of many different LVLMs, such as ensemble methods. In this work, we propose self-ensemble, a novel method that improves the generalization and visual reasoning of the model without updating any parameters, a training-free method. Our key insight is that we realized that LVLM itself can ensemble without the need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Constraint Satisfaction and Optimization · Fuzzy Logic and Control Systems

MethodsFocus