IMDb data from Two Generations, from 1979 to 2019; Part one, Dataset Introduction and Preliminary Analysis
M. Bahraminasr, A. Vafaei Sadr

TL;DR
This paper introduces a comprehensive IMDb dataset covering 79,000 titles from 1979 to 2019 and provides preliminary analysis on trends, demographics, and factors influencing movie success.
Contribution
It creates the largest IMDb dataset to date and offers initial insights into data trends, demographics, and success factors using statistical and machine learning methods.
Findings
Identified trends in IMDb data over four decades
Analyzed demographic patterns of IMDb scores
Explored relationships between genre, ratings, and success factors
Abstract
"IMDb" as a user-regulating and one the most-visited portal has provided an opportunity to create an enormous database. Analysis of the information on Internet Movie Database - IMDb, either those related to the movie or provided by users would help to reveal the determinative factors in the route of success for each movie. As the lack of a comprehensive dataset was felt, we determined to do create a compendious dataset for the later analysis using the statistical methods and machine learning models; It comprises of various information provided on IMDb such as rating data, genre, cast and crew, MPAA rating certificate, parental guide details, related movie information, posters, etc, for over 79k titles which is the largest dataset by this date. The present paper is the first paper in a series of papers aiming at the mentioned goals, by a description of the created dataset and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedia Influence and Politics · Authorship Attribution and Profiling · Misinformation and Its Impacts
