A Large-scale Dataset with Behavior, Attributes, and Content of Mobile   Short-video Platform

Yu Shang; Chen Gao; Nian Li; Yong Li

arXiv:2502.05922·cs.MM·February 11, 2025

A Large-scale Dataset with Behavior, Attributes, and Content of Mobile Short-video Platform

Yu Shang, Chen Gao, Nian Li, Yong Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive large-scale dataset from a mobile short-video platform, capturing user behavior, attributes, and content, to facilitate research in recommendation systems, social science, and human behavior analysis.

Contribution

It provides a rich, large-scale dataset addressing gaps in existing data, with extensive user-video interaction data, attributes, and content features, validated through multiple technical assessments.

Findings

01

Dataset covers 10,000 users and 153,561 videos.

02

Benchmarking of recommendation algorithms demonstrates dataset's utility.

03

Analysis of filter bubble phenomenon using the dataset.

Abstract

Short-video platforms show an increasing impact on people's daily lives nowadays, with billions of active users spending plenty of time each day. The interactions between users and online platforms give rise to many scientific problems across computational social science and artificial intelligence. However, despite the rapid development of short-video platforms, currently there are serious shortcomings in existing relevant datasets on three aspects: inadequate user-video feedback, limited user attributes and lack of video content. To address these problems, we provide a large-scale dataset with rich user behavior, attributes and video content from a real mobile short-video platform. This dataset covers 10,000 voluntary users and 153,561 videos, and we conduct four-fold technical validations of the dataset. First, we verify the richness of the behavior and attribute data. Second, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tsinghua-fib-lab/shortvideo_dataset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computing and Algorithms