Cross-modal Search Method of Technology Video based on Adversarial Learning and Feature Fusion
Xiangbin Liu, Junping Du, Meiyu Liang, Ang Li

TL;DR
This paper introduces a novel adversarial learning framework for cross-modal technology video search, effectively bridging the semantic gap between text and video features through feature fusion and mapping.
Contribution
It proposes a new FFACR method that uses adversarial learning for multi-modal feature fusion and semantic space mapping to improve text-to-video retrieval accuracy.
Findings
Outperforms existing methods in text-to-video search accuracy
Effectively reduces the semantic gap between modalities
Validated on self-built technology video datasets
Abstract
Technology videos contain rich multi-modal information. In cross-modal information search, the data features of different modalities cannot be compared directly, so the semantic gap between different modalities is a key problem that needs to be solved. To address the above problems, this paper proposes a novel Feature Fusion based Adversarial Cross-modal Retrieval method (FFACR) to achieve text-to-video matching, ranking and searching. The proposed method uses the framework of adversarial learning to construct a video multimodal feature fusion network and a feature mapping network as generator, a modality discrimination network as discriminator. Multi-modal features of videos are obtained by the feature fusion network. The feature mapping network projects multi-modal features into the same semantic space based on semantics and similarity. The modality discrimination network is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
