We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Early Success Prediction of Indian Movies Using Subtitles: A Document Vector Approach.
- Authors
Rahul, Vaddadi Sai; Tejas, M.; Prasanth, N. Narayanan; Raja, S. P.
- Abstract
Scientific studies of the elements that influence the box office performance of Indian films have generally concentrated on post-production elements, such as those discovered after a film has been completed or released, and notably for Bollywood films. Only fewer studies have looked at regional film industries and pre-production factors, which are elements that are known before a decision to greenlight a film is made. This study looked at Indian films using natural language processing and machine learning approaches to see if they would be profitable in the pre-production stage. We extract movie data and English subtitles (as an approximation to the screenplay) for the top five Indian regional film industries: Bollywood, Kollywood, Tollywood, Mollywood, and Sandalwood, as they make up a major portion of the Indian film industry's revenue. Subtitle Vector (Sub2Vec), a Paragraph Vector model trained on English subtitles, was used to embed subtitle text into 50 and 100 dimensions. The proposed approach followed a two-stage pipeline. In the first stage, Return on Investment (ROI) was calculated using aggregated subtitle embeddings and associated movie data. Classification models used the ROI calculated in the first step to predicting a film's verdict in the second step. The optimal regressor–classifier pair was determined by evaluating classification models using F 1 -score and Cohen's Kappa scores on various hyperparameters. When compared to benchmark methods, our proposed methodology forecasts box office success more accurately.
- Subjects
NATURAL language processing; INDIAN films; FORECASTING methodology; BUSINESS revenue; MOTION picture industry
- Publication
International Journal of Image & Graphics, 2023, Vol 23, Issue 4, p1
- ISSN
0219-4678
- Publication type
Article
- DOI
10.1142/S0219467823500304