Transfer Learning in Visual and Relational Reasoning
T.S. Jayram, Vincent Marois, Tomasz Kornuta, Vincent Albouy, and Emre Sevgen, Ahmet S. Ozcan

TL;DR
This paper formalizes transfer learning for visual and relational reasoning, introduces a new model SAMNet that achieves state-of-the-art results, and addresses the challenges of transferring reasoning capabilities in visual tasks.
Contribution
It provides a theoretical framework for transfer learning in visual reasoning and introduces SAMNet, a novel model with improved transfer learning performance.
Findings
SAMNet achieves state-of-the-art accuracy on CLEVR and COG datasets.
SAMNet's architecture effectively decouples reasoning from sequence length.
The model's selective attention improves transfer learning in visual reasoning tasks.
Abstract
Transfer learning has become the de facto standard in computer vision and natural language processing, especially where labeled data is scarce. Accuracy can be significantly improved by using pre-trained models and subsequent fine-tuning. In visual reasoning tasks, such as image question answering, transfer learning is more complex. In addition to transferring the capability to recognize visual features, we also expect to transfer the system's ability to reason. Moreover, for video data, temporal reasoning adds another dimension. In this work, we formalize these unique aspects of transfer learning and propose a theoretical framework for visual reasoning, exemplified by the well-established CLEVR and COG datasets. Furthermore, we introduce a new, end-to-end differentiable recurrent model (SAMNet), which shows state-of-the-art accuracy and better performance in transfer learning on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
