Towards Micro-video Thumbnail Selection via a Multi-label Visual-semantic Embedding Model
Liu Bo

TL;DR
This paper introduces a multi-label visual-semantic embedding model for selecting micro-video thumbnails that align with user interests, utilizing shared semantic space and attention mechanisms to improve relevance and attractiveness.
Contribution
The paper proposes a novel multi-label embedding approach with attention mechanisms to better match video frames with user interests for thumbnail selection.
Findings
Model significantly outperforms state-of-the-art baselines
Effective in capturing user interests through semantic embedding
Improves thumbnail relevance and attractiveness
Abstract
The thumbnail, as the first sight of a micro-video, plays a pivotal role in attracting users to click and watch. While in the real scenario, the more the thumbnails satisfy the users, the more likely the micro-videos will be clicked. In this paper, we aim to select the thumbnail of a given micro-video that meets most users` interests. Towards this end, we present a multi-label visual-semantic embedding model to estimate the similarity between the pair of each frame and the popular topics that users are interested in. In this model, the visual and textual information is embedded into a shared semantic space, whereby the similarity can be measured directly, even the unseen words. Moreover, to compare the frame to all words from the popular topics, we devise an attention embedding space associated with the semantic-attention projection. With the help of these two embedding spaces, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Computing and Algorithms · Misinformation and Its Impacts
