Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges
Andrii Dzhoha, Katya Mirylenka, Egor Malykh, Marco-Andrea Buchmann, Francesca Catino

TL;DR
This paper explores the challenges of recommending short-form videos, such as cold-start and bias issues, and proposes a multimodal retrieval approach that outperforms traditional methods in an e-commerce setting.
Contribution
It introduces a multimodal vision-language retrieval system tailored for short-form video recommendations, addressing cold-start and bias challenges effectively.
Findings
Multimodal retrieval outperforms supervised learning in online tests.
Leveraging vision-language models mitigates position and duration biases.
Effective for new video experiences with limited interaction data.
Abstract
In recent years, social media users have spent significant amounts of time on short-form video platforms. As a result, established platforms in other domains, such as e-commerce, have begun introducing short-form video content to engage users and increase their time spent on the platform. The success of these experiences is due not only to the content itself but also to a unique UI innovation: instead of offering users a list of choices to click, platforms actively recommend content for users to watch one at a time. This creates new challenges for recommender systems, especially when launching a new video experience. Beyond the limited interaction data, immersive feed experiences introduce stronger position bias due to the UI and duration bias when optimizing for watch-time, as models tend to favor shorter videos. These issues, together with the feedback loop inherent in recommender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Recommender Systems and Techniques · Video Analysis and Summarization
