Data Mining Approach to Analysis and Prediction of Movie Success

Upeksha P. Kudagamage ,Sabaragamuwa University of Sri Lanka Belihuloya, Sri Lanka (upekshakg@gmail.com)
Banage T.G.S. Kumara ,Sabaragamuwa University of Sri Lanka Belihuloya, Sri Lanka (btgsk2000@gmail.com)
Chaminda H. Baduraliya ,Sabaragamuwa University of Sri Lanka Belihuloya, Sri Lanka (chamihb@gmail.com)

Abstract :-

Data mining is a very efficient approach to uncover information which will both confirm or disprove common assumptions about movies, and it also allows us to predict the success or failure of a future movie using the known information about the particular movie before its release. The main aim of this study is to analyze data mining approaches to explore the attributes affecting the success or failure of a movie. Each and every data mining algorithm provides separate prediction accuracy details. This study integrates four data mining algorithms (Decision Trees, Naïve Bayes, Support Vector Machine, Neural Networks) and an Ensemble approach in order to address the intriguing problem of the movie success prediction and also demonstrates the correlation between success or failure of a movie and different attributes of movies like Opening weekend Gross, Sequel, Theaters, Budget, Genre, Distributors, Country, IMDB Rating, MPAA Rating, Run Time etc. The prediction performance of these models has been evaluated using Accuracy, Precision, Recall, F-Measure, MCC, ROC Area, PRC Area, Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) etc. Further, a spatial clustering technique called the Associated Keyword Space (ASKS) was applied for this study, which is effective for noisy data and projected clustering result from a threedimensional(3D) sphere to a two dimensional(2D) spherical surface for 2D visualization. Similarities between movies were calculated using the Cosine Similarity and these affinity values were used for this clustering model. Movies were categorized under the success or failure of movies by clustering them into four clusters as Most Successful Movies, Successful Movies, Unsuccessful Movies and Least Successful movies. Experimental results show the most effective attributes towards the success or failure of a movie out of these movie attributes considered in this study. Moviemakers can use these results to identify which movie attributes are the most effective and can consider them for the success of their future movie productions. Also, using the Correlation Coefficient, a mathematical model that can be used to predict the movie’s success or failure is proposed in this study.

Keywords─ decision tree, naïve Bayes, neural networks, support vector machines, ensemble, spatial clustering