IMDb data from Two Generations, from 1979 to 2019; Part one, Dataset   Introduction and Preliminary Analysis

M. Bahraminasr; A. Vafaei Sadr

arXiv:2005.14147·cs.CY·September 8, 2020

IMDb data from Two Generations, from 1979 to 2019; Part one, Dataset Introduction and Preliminary Analysis

M. Bahraminasr, A. Vafaei Sadr

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a comprehensive IMDb dataset covering 79,000 titles from 1979 to 2019 and provides preliminary analysis on trends, demographics, and factors influencing movie success.

Contribution

It creates the largest IMDb dataset to date and offers initial insights into data trends, demographics, and success factors using statistical and machine learning methods.

Findings

01

Identified trends in IMDb data over four decades

02

Analyzed demographic patterns of IMDb scores

03

Explored relationships between genre, ratings, and success factors

Abstract

"IMDb" as a user-regulating and one the most-visited portal has provided an opportunity to create an enormous database. Analysis of the information on Internet Movie Database - IMDb, either those related to the movie or provided by users would help to reveal the determinative factors in the route of success for each movie. As the lack of a comprehensive dataset was felt, we determined to do create a compendious dataset for the later analysis using the statistical methods and machine learning models; It comprises of various information provided on IMDb such as rating data, genre, cast and crew, MPAA rating certificate, parental guide details, related movie information, posters, etc, for over 79k titles which is the largest dataset by this date. The present paper is the first paper in a series of papers aiming at the mentioned goals, by a description of the created dataset and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mjdbahram/IMDb-sample-data
noneOfficial

Models

🤗
Bictole/NLP_DEEP_2
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedia Influence and Politics · Authorship Attribution and Profiling · Misinformation and Its Impacts