Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data
Badr AlKhamissi, Yingtian Tang, Abd\"ulkadir G\"okce, Johannes Mehrer,, Martin Schrimpf

TL;DR
This paper introduces a developmentally inspired self-synthesis training method for vision-language models that effectively learns with limited data by mimicking human cognitive development stages.
Contribution
It proposes a novel four-phase self-synthesis approach that enables training multimodal models with less data, integrating language and vision in a developmentally plausible manner.
Findings
Effective training with limited data demonstrated
Model achieves comparable performance to data-intensive methods
Progresses through stages from basic language to complex reasoning
Abstract
While today's large language models exhibit impressive abilities in generating human-like text, they require massive amounts of data during training. We here take inspiration from human cognitive development to train models in limited data conditions. Specifically we present a self-synthesis approach that iterates through four phases: Phase 1 sets up fundamental language abilities, training the model from scratch on a small corpus. Language is then associated with the visual environment in phase 2, integrating the model with a vision encoder to generate descriptive captions from labeled images. In the "self-synthesis" phase 3, the model generates captions for unlabeled images, that it then uses to further train its language component with a mix of synthetic, and previous real-world text. This phase is meant to expand the model's linguistic repertoire, similar to humans self-annotating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
