PatchCT: Aligning Patch Set and Label Set with Conditional Transport for   Multi-Label Image Classification

Miaoge Li; Dongsheng Wang; Xinyang Liu; Zequn Zeng; Ruiying Lu; Bo; Chen; Mingyuan Zhou

arXiv:2307.09066·cs.CV·August 21, 2023

PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification

Miaoge Li, Dongsheng Wang, Xinyang Liu, Zequn Zeng, Ruiying Lu, Bo, Chen, Mingyuan Zhou

PDF

Open Access 1 Repo

TL;DR

PatchCT introduces a novel approach for multi-label image classification by using conditional transport to align image patch embeddings with label embeddings, improving interpretability and performance without complex attention modules.

Contribution

The paper proposes a new method applying conditional transport to align patch and label sets, eliminating the need for complex attention-based alignment modules.

Findings

01

Outperforms previous methods on three public benchmarks

02

Provides interpretable visualization of learned prototypes

03

Efficiently models patch-label interactions via bidirectional CT

Abstract

Multi-label image classification is a prediction task that aims to identify more than one label from a given image. This paper considers the semantic consistency of the latent space between the visual patch and linguistic label domains and introduces the conditional transport (CT) theory to bridge the acknowledged gap. While recent cross-modal attention-based studies have attempted to align such two representations and achieved impressive performance, they required carefully-designed alignment modules and extra complex operations in the attention computation. We find that by formulating the multi-label classification as a CT problem, we can exploit the interactions between the image and label efficiently by minimizing the bidirectional CT cost. Specifically, after feeding the images and textual labels into the modality-specific encoders, we view each image as a mixture of patch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

keepgoingjkg/patchct
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsALIGN