LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation
Haichao Zhang, Yao Lu, Lichen Wang, Yunzhe Li, Daiwei Chen, Yunpeng Xu, Yun Fu

TL;DR
LinkedOut introduces a novel video representation leveraging VLLMs that enables fast, multi-video, knowledge-aware video recommendation without relying on handcrafted labels, significantly improving inference speed and interpretability.
Contribution
The paper presents LinkedOut, the first VLLM-based video recommendation method that operates directly on raw frames, supporting multi-video inputs and low-latency inference, with a novel cross-layer knowledge fusion mechanism.
Findings
Achieves state-of-the-art results on standard benchmarks.
Supports multi-video histories for recommendation.
Enables interpretable and low-latency inference.
Abstract
Video Large Language Models (VLLMs) unlock world-knowledge-aware video understanding through pretraining on internet-scale data and have already shown promise on tasks such as movie analysis and video question answering. However, deploying VLLMs for downstream tasks such as video recommendation remains challenging, since real systems require multi-video inputs, lightweight backbones, low-latency sequential inference, and rapid response. In practice, (1) decode-only generation yields high latency for sequential inference, (2) typical interfaces do not support multi-video inputs, and (3) constraining outputs to language discards fine-grained visual details that matter for downstream vision tasks. We argue that these limitations stem from the absence of a representation that preserves pixel-level detail while leveraging world knowledge. We present LinkedOut, a representation that extracts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
