Topology-Aware Representation Alignment for Semi-Supervised Vision-Language Learning

Junwon You; Mihyun Jang; Sangwoo Mo; Jae-Hun Jung

arXiv:2604.26370·cs.CV·April 30, 2026

Topology-Aware Representation Alignment for Semi-Supervised Vision-Language Learning

Junwon You, Mihyun Jang, Sangwoo Mo, Jae-Hun Jung

PDF

TL;DR

This paper introduces ToMA, a topology-aware framework using persistent homology to improve multimodal representation alignment in semi-supervised vision-language learning, enhancing stability and structural modeling.

Contribution

It proposes a novel topology-based alignment method leveraging persistent homology to better capture multimodal structure without complex simplices.

Findings

01

ToMA improves performance on remote sensing tasks.

02

ToMA provides stable gains over existing methods.

03

Lightweight H_1-birth edges capture useful higher-order structures.

Abstract

Vision-language models have shown strong performance, but they often generalize poorly to specialized domains. While semi-supervised vision-language learning mitigates this limitation by leveraging a small set of labeled image-text pairs together with abundant unlabeled images, existing methods remain fundamentally pairwise and fail to model the global structure of multimodal representation manifolds. Existing topology-based alignment methods rely on persistence diagram matching, which neither guarantees geometric alignment nor utilizes the image-text pairing information central to vision-language learning. We propose Topology-Aware Multimodal Representation Alignment (ToMA), a framework that uses persistent homology to identify topologically salient edges and aligns them across modalities through available cross-modal correspondences. ToMA leverages both H_0-death edges and lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.