How clever is the FiLM model, and how clever can it be?
Alexander Kuhnle, Huiyuan Xie, Ann Copestake

TL;DR
This paper critically examines the FiLM model's ability to learn complex linguistic structures, revealing limitations in relational reasoning and the importance of dataset composition and pretraining strategies.
Contribution
It provides a detailed analysis of FiLM's linguistic learning capabilities and highlights fundamental limitations of large dataset approaches.
Findings
FiLM struggles with relational statements beyond simple cases
Pretraining and broader datasets can improve learning
Mixing datasets is less robust and sensitive to dataset structure
Abstract
The FiLM model achieves close-to-perfect performance on the diagnostic CLEVR dataset and is distinguished from other such models by having a comparatively simple and easily transferable architecture. In this paper, we investigate in more detail the ability of FiLM to learn various linguistic constructions. Our main results show that (a) FiLM is not able to learn relational statements straight away except for very simple instances, (b) training on a broader set of instances as well as pretraining on simpler instance types can help alleviate these learning difficulties, (c) mixing is less robust than pretraining and very sensitive to the compositional structure of the dataset. Overall, our results suggest that the approach of big all-encompassing datasets and the paradigm of "the effectiveness of data" may have fundamental limitations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
