Generalization Differences between End-to-End and Neuro-Symbolic   Vision-Language Reasoning Systems

Wang Zhu; Jesse Thomason; Robin Jia

arXiv:2210.15037·cs.CL·October 28, 2022·1 cites

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

Wang Zhu, Jesse Thomason, Robin Jia

PDF

Open Access

TL;DR

This paper compares end-to-end and neuro-symbolic vision-language reasoning systems across various out-of-distribution tests, revealing their strengths and weaknesses, and emphasizing the need for diverse robustness evaluations.

Contribution

It provides a comprehensive analysis of how these two paradigms perform under different generalization scenarios, highlighting their complementary benefits.

Findings

01

End-to-end systems show significant performance drops on all tests.

02

Neuro-symbolic methods perform worse on cross-benchmark transfer but better on other tests.

03

Few-shot training quickly improves neuro-symbolic methods' performance.

Abstract

For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance. In which out-of-distribution settings does each paradigm excel? We investigate this question on both single-image and multi-image visual question-answering through four types of generalization tests: a novel segment-combine test for multi-image queries, contrast set, compositional generalization, and cross-benchmark transfer. Vision-and-language end-to-end trained systems exhibit sizeable performance drops across all these tests. Neuro-symbolic methods suffer even more on cross-benchmark transfer from GQA to VQA, but they show smaller accuracy drops on the other generalization tests and their performance quickly improves by few-shot training. Overall, our results demonstrate the complementary benefits of these two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsTest