TL;DR
This paper introduces a transductive semi-supervised approach for visual verb sense disambiguation that effectively leverages limited labeled data and multimodal features, significantly outperforming previous methods.
Contribution
It proposes a novel graph-based label propagation method for VVSD in a transductive semi-supervised setting, reducing the need for extensive annotated data.
Findings
Outperforms state-of-the-art on VerSe dataset
Uses only a small fraction of labeled samples
Effective multimodal representation integration
Abstract
Verb Sense Disambiguation is a well-known task in NLP, the aim is to find the correct sense of a verb in a sentence. Recently, this problem has been extended in a multimodal scenario, by exploiting both textual and visual features of ambiguous verbs leading to a new problem, the Visual Verb Sense Disambiguation (VVSD). Here, the sense of a verb is assigned considering the content of an image paired with it rather than a sentence in which the verb appears. Annotating a dataset for this task is more complex than textual disambiguation, because assigning the correct sense to a pair of image, verb requires both non-trivial linguistic and visual skills. In this work, differently from the literature, the VVSD task will be performed in a transductive semi-supervised learning (SSL) setting, in which only a small amount of labeled information is required, reducing tremendously the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
