I’ve been curious the ‘type’ of movies I collect, hence over the weekend I was doing some analysis. Here is the output graph(Considering the genres of around 350+ movies).
I’ve used Python to read filenames, process them, strip the name into a meaningful one, send a request to IMDB or IMDBAPI, collect the data in a Output.JSON
file, then use Node.js to filter into manageable JSON and using Morris.js draw them. BTW, Thriller takes the first place.
Observations and Challenges:
Initial challenge was to read all the file names in the same format. This was almost successful, around 80 file names were error-ed out, as they weren’t in the same format(even after the pruning and all).
Once you have the file name, it’s actually kinda difficult to look them up. This is because IMDB doesn’t have a public API, though that’s the best one we have. I’ll speak about the shortcomings below. Also, there is one API from TMDb, but the registration is painful. There are another two APIs one from Rotten Tomatoes, and the other from www.imdbapi.org , but with these both you will need to have the exact name of the Movie, which isn’t the case with IMDB search API. We will be using both IMDB API and www.imdbapi.org's API. Reason for not using a Rotten Tomatoes API, is because it doesn't give you the genre names.
The rest is fairly simple. Though I could’ve used Python for everything, just to get a hang of file read using Node.js, I used Node to translate the Output.JSON file(explained below) and get the JSON for graphical representation.
Steps Involved:
Read the file names from disk, and then prune them:
* Prune them, as per a common format:
God,that was easy. When you run these scripts, your output.JSON
will nicely get populated with the JSON information of your Movies.
Lastly, making using of Node.js to organize the output.JSON based on genres(Could have used Python as well)
the arr
in the above script will nicely give you the count of each genre of all the movies you have,something like this.
[ { label: 'Action', value: 132 },
{ label: 'Comedy', value: 66 },
{ label: 'Adventure', value: 71 },
{ label: 'Sci-Fi', value: 53 },
{ label: 'Biography', value: 18 },
{ label: 'Drama', value: 174 },
{ label: 'History', value: 15 },
{ label: 'Romance', value: 41 },
{ label: 'Crime', value: 128 },
{ label: 'Mystery', value: 71 },
{ label: 'Thriller', value: 185 },
{ label: 'War', value: 21 },
{ label: 'Fantasy', value: 50 },
{ label: 'Horror', value: 30 },
{ label: 'Sport', value: 11 },
{ label: 'Family', value: 16 },
{ label: 'Western', value: 10 },
{ label: 'Animation', value: 11 },
{ label: 'Film-Noir', value: 1 },
{ label: 'Documentary', value: 2 },
{ label: 'Short', value: 3 },
{ label: 'Musical', value: 1 } ]
Once you have those results with you, use morris.js to draw a Donut as above.
I was never surprised, I never knew I watch Drama that much!