Cross-modal Search Method of Technology Video based on Adversarial   Learning and Feature Fusion

Xiangbin Liu; Junping Du; Meiyu Liang; Ang Li

arXiv:2210.05243·cs.IR·October 12, 2022

Cross-modal Search Method of Technology Video based on Adversarial Learning and Feature Fusion

Xiangbin Liu, Junping Du, Meiyu Liang, Ang Li

PDF

Open Access

TL;DR

This paper introduces a novel adversarial learning framework for cross-modal technology video search, effectively bridging the semantic gap between text and video features through feature fusion and mapping.

Contribution

It proposes a new FFACR method that uses adversarial learning for multi-modal feature fusion and semantic space mapping to improve text-to-video retrieval accuracy.

Findings

01

Outperforms existing methods in text-to-video search accuracy

02

Effectively reduces the semantic gap between modalities

03

Validated on self-built technology video datasets

Abstract

Technology videos contain rich multi-modal information. In cross-modal information search, the data features of different modalities cannot be compared directly, so the semantic gap between different modalities is a key problem that needs to be solved. To address the above problems, this paper proposes a novel Feature Fusion based Adversarial Cross-modal Retrieval method (FFACR) to achieve text-to-video matching, ranking and searching. The proposed method uses the framework of adversarial learning to construct a video multimodal feature fusion network and a feature mapping network as generator, a modality discrimination network as discriminator. Multi-modal features of videos are obtained by the feature fusion network. The feature mapping network projects multi-modal features into the same semantic space based on semantics and similarity. The modality discrimination network is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications