Leveraging Weak Cross-Modal Guidance for Coherence Modelling via   Iterative Learning

Yi Bin; Junrong Liao; Yujuan Ding; Haoxuan Li; Yang Yang; See-Kiong; Ng; Heng Tao Shen

arXiv:2408.00305·cs.MM·August 2, 2024

Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong, Ng, Heng Tao Shen

PDF

1 Repo

TL;DR

This paper introduces WeGO, a novel iterative learning approach that leverages high-confidence predictions in one modality to guide coherence modeling in another, effectively improving cross-modal coherence without requiring labeled coherence data.

Contribution

The paper proposes WeGO, a new method for cross-modal coherence modeling that uses weak guidance from high-confidence predictions and iterative joint optimization, bypassing the need for labeled coherence data.

Findings

01

Outperforms existing cross-modal coherence methods on two datasets.

02

Effective ablation results validate key modules of the proposed approach.

03

Iterative boosting enhances coherence prediction accuracy.

Abstract

Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information, thereby understanding and creating content of the physical world coherently like human-beings. Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality. Despite of the effectiveness, labeled associated coherency information is not always available and might be costly to acquire, making the cross-modal guidance hard to leverage. To tackle this challenge, this paper explores a new way to take advantage of cross-modal guidance without gold labels on coherency, and proposes the Weak Cross-Modal Guided Ordering (WeGO) model. More specifically, it leverages high-confidence predicted pairwise order in one modality as reference information to guide the coherence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scvready123/iterwego
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.