Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints
David Semedo, Jo\~ao Magalh\~aes

TL;DR
This paper introduces a novel scheduled adaptive maximum-margin approach for cross-modal embeddings, improving the organization of multimodal data by adaptively enforcing semantic correlations during training.
Contribution
The paper proposes a scheduled adaptive margin formulation that infers triplet-specific constraints, enhancing cross-modal learning beyond static margin methods.
Findings
Achieved up to 12.5% improvement over state-of-the-art methods.
Demonstrated effectiveness on widely used datasets.
Enhanced organization of multimodal instances by adaptive margin enforcement.
Abstract
Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a novel scheduled adaptive maximum-margin (SAM) formulation that infers triplet-specific constraints during training, therefore organising instances by adaptively enforcing inter-category and inter-modality correlations. This is supported by a scheduled adaptive margin function, that is smoothly activated, replacing a static margin by an adaptively inferred one reflecting triplet-specific semantic correlations while accounting for the incremental learning behaviour of neural networks to enforce category cluster formation and enforcement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
