Towards Visual Semantics
Fausto Giunchiglia, Luca Erculiani, Andrea Passerini

TL;DR
This paper proposes a theory and algorithm to map visual perception concepts to linguistic classification concepts, enabling better integration of visual data and natural language understanding, with promising initial results.
Contribution
It introduces a novel framework for building substance concepts aligned with classification concepts, using visual objects, hierarchy, and human feedback.
Findings
Algorithm learns Genus and Differentia with reasonable accuracy
Effective with limited examples and partial supervision
Preliminary experiments show promising results
Abstract
Lexical Semantics is concerned with how words encode mental representations of the world, i.e., concepts . We call this type of concepts, classification concepts . In this paper, we focus on Visual Semantics , namely on how humans build concepts representing what they perceive visually. We call this second type of concepts, substance concepts . As shown in the paper, these two types of concepts are different and, furthermore, the mapping between them is many-to-many. In this paper we provide a theory and an algorithm for how to build substance concepts which are in a one-to-one correspondence with classifications concepts, thus paving the way to the seamless integration between natural language descriptions and visual perception. This work builds upon three main intuitions: (i) substance concepts are modeled as visual objects , namely sequences of similar frames, as perceived in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
