Aligning Visual and Lexical Semantics
Fausto Giunchiglia, Mayukh Bagchi, Xiaolei Diao

TL;DR
This paper explores the disconnect between visual and lexical semantics in computer vision, proposing a domain-agnostic method to align these two types of semantics to address the Semantic Gap Problem.
Contribution
It introduces a novel, domain-agnostic methodology to align visual and lexical semantics, aiming to bridge the Semantic Gap in computer vision systems.
Findings
Demonstrates the lack of coincidence between visual and lexical semantics.
Proposes a general methodology for semantic alignment.
Highlights potential improvements in CV system understanding.
Abstract
We discuss two kinds of semantics relevant to Computer Vision (CV) systems - Visual Semantics and Lexical Semantics. While visual semantics focus on how humans build concepts when using vision to perceive a target reality, lexical semantics focus on how humans build concepts of the same target reality through the use of language. The lack of coincidence between visual and lexical semantics, in turn, has a major impact on CV systems in the form of the Semantic Gap Problem (SGP). The paper, while extensively exemplifying the lack of coincidence as above, introduces a general, domain-agnostic methodology to enforce alignment between visual and lexical semantics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCategorization, perception, and language · Constraint Satisfaction and Optimization · Visual Attention and Saliency Detection
