Can machines learn to see without visual databases?

Alessandro Betti; Marco Gori; Stefano Melacci; Marcello Pelillo; Fabio; Roli

arXiv:2110.05973·cs.CV·November 23, 2021

Can machines learn to see without visual databases?

Alessandro Betti, Marco Gori, Stefano Melacci, Marcello Pelillo, Fabio, Roli

PDF

Open Access

TL;DR

This paper advocates for developing vision-learning machines that acquire visual skills through minimal human interaction, without relying on large visual databases, aiming for more human-like learning processes.

Contribution

It proposes a new approach to machine vision that emphasizes learning from limited supervision and natural interactions rather than extensive visual datasets.

Findings

01

Highlights the potential of minimal supervision for visual learning.

02

Suggests new foundations for computational vision processes.

03

Encourages alternative vision learning paradigms beyond deep learning.

Abstract

This paper sustains the position that the time has come for thinking of learning machines that conquer visual skills in a truly human-like context, where a few human-like object supervisions are given by vocal interactions and pointing aids only. This likely requires new foundations on computational processes of vision with the final purpose of involving machines in tasks of visual description by living in their own visual environment under simple man-machine linguistic interactions. The challenge consists of developing machines that learn to see without needing to handle visual databases. This might open the doors to a truly orthogonal competitive track concerning deep learning technologies for vision which does not rely on the accumulation of huge visual databases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection