Visual-Semantic Decomposition and Partial Alignment for Document-based   Zero-Shot Learning

Xiangyan Qu; Jing Yu; Keke Gai; Jiamin Zhuang; Yuanmin Tang; Gang; Xiong; Gaopeng Gou; Qi Wu

arXiv:2407.15613·cs.CV·July 24, 2024

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang, Xiong, Gaopeng Gou, Qi Wu

PDF

1 Repo

TL;DR

This paper introduces a novel approach for document-based zero-shot learning that extracts and aligns multi-view semantic concepts from documents and images, improving performance by focusing on partial rather than full concept alignment.

Contribution

The work proposes a semantic decomposition network with specialized loss functions to enable partial alignment of visual and textual semantic concepts, addressing redundancy and diversity issues.

Findings

01

Outperforms state-of-the-art methods on three benchmarks

02

Learned partial associations are interpretable

03

Effective semantic concept extraction from documents and images

Abstract

Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-view semantic concepts from documents and images and align the matching rather than entire concepts. Specifically, we propose a semantic decomposition module to generate multi-view semantic embeddings from visual and textual sides, providing the basic concepts for partial alignment. To alleviate the issue of information redundancy among embeddings, we propose the local-to-semantic variance loss to capture distinct local details and multiple semantic diversity loss to enforce orthogonality among…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

morningstarovo/emdepart
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN