TL;DR
This paper introduces Visual Taxonomy Expansion (VTE), a novel approach that incorporates visual features with textual semantics to improve taxonomy expansion, especially for unseen terms, outperforming existing methods and ChatGPT on Chinese datasets.
Contribution
The paper presents a new multimodal framework combining textual and visual semantics with a hyper-proto constraint for enhanced taxonomy expansion.
Findings
Achieves 8.75% accuracy improvement on Chinese taxonomy dataset.
Outperforms ChatGPT in taxonomy expansion tasks.
Demonstrates effectiveness of visual features in taxonomy tasks.
Abstract
Taxonomy expansion task is essential in organizing the ever-increasing volume of new concepts into existing taxonomies. Most existing methods focus exclusively on using textual semantics, leading to an inability to generalize to unseen terms and the "Prototypical Hypernym Problem." In this paper, we propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion task. We propose a textual hypernymy learning task and a visual prototype learning task to cluster textual and visual semantics. In addition to the tasks on respective modalities, we introduce a hyper-proto constraint that integrates textual and visual semantics to produce fine-grained visual semantics. Our method is evaluated on two datasets, where we obtain compelling results. Specifically, on the Chinese taxonomy dataset, our method significantly improves accuracy by 8.75 %. Additionally, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
