Learning to see like children: proof of concept
Marco Gori, Marco Lippi, Marco Maggini, Stefano Melacci

TL;DR
This paper introduces a developmental learning framework for visual agents that mimics children's learning, leveraging motion coherence and minimal supervision to achieve semantic labeling in complex environments.
Contribution
It proposes a novel protocol for visual learning inspired by children, utilizing motion coherence and constraints, with a lifelong learning approach and crowd-sourced evaluation.
Findings
Semantic labeling emerges with few supervised examples.
Motion coherence provides extensive supervision signals.
Framework supports lifelong learning without clear training/test separation.
Abstract
In the last few years we have seen a growing interest in machine learning approaches to computer vision and, especially, to semantic labeling. Nowadays state of the art systems use deep learning on millions of labeled images with very successful results on benchmarks, though it is unlikely to expect similar results in unrestricted visual environments. Most learning schemes essentially ignore the inherent sequential structure of videos: this might be a critical issue, since any visual recognition process is remarkably more complex when shuffling video frames. Based on this remark, we propose a re-foundation of the communication protocol between visual agents and the environment, which is referred to as learning to see like children. Like for human interaction, visual concepts are acquired by the agents solely by processing their own visual stream along with human supervisions on selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
