FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez; Florian Strub; Harm de Vries; Vincent Dumoulin; Aaron; Courville

arXiv:1709.07871·cs.CV·December 20, 2017·187 cites

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron, Courville

PDF

Open Access 5 Repos 3 Models

TL;DR

FiLM introduces a versatile conditioning layer that significantly improves neural network performance in visual reasoning tasks by enabling effective feature modulation, leading to state-of-the-art results and strong generalization.

Contribution

The paper presents FiLM, a novel feature-wise linear modulation technique that enhances neural networks' ability to perform complex visual reasoning tasks.

Findings

01

Halves state-of-the-art error on CLEVR benchmark

02

Effectively modulates features for reasoning tasks

03

Generalizes well to new data with few or zero examples

Abstract

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques