{"id":1814,"date":"2021-12-17T00:06:03","date_gmt":"2021-12-16T18:36:03","guid":{"rendered":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/?page_id=1814"},"modified":"2021-12-17T00:09:37","modified_gmt":"2021-12-16T18:39:37","slug":"final-arpita-kabra","status":"publish","type":"page","link":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/final-arpita-kabra\/","title":{"rendered":"Final Project"},"content":{"rendered":"<p><strong>A Content Based Movie Recommender using Kaggle<\/strong><\/p>\n<p>Recommendation Systems are used in almost every form of available online entertainment platforms such as Youtube, Net\ufb02ix, Spotify, etc. The main aim is to enhance user experience by providing them better suggestions based on their activity, the content viewed by them, the similar content viewed by other people, etc. As a part of this project I have tried to create a basic version of a Content-Based Movie Recommendation System using TMDB 5000 Movie Dataset available on Kaggle that contains metadata of around 4800 unique movies including movie overview, genre, cast and crew, ratings, votes, turnover, language, etc.<\/p>\n<h1>Kaggle as a Digital Tool<\/h1>\n<p>Over the past few years, as the scope of Machine Learning and Artificial Intelligence grew, Kaggle has become a leading platform for collaboration of Data Scientists, whether beginners or researchers. The Kaggle community hosts several free datasets in various fields, algorithms, code snippets, competitions involving real life problems, etc. Launched by Google in 2010, Kaggle allows users to explore anything related to Data Science and its applications, also proving to be a great online learning platform for free.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/final-arpita-kabra\/fa1\/\" rel=\"attachment wp-att-1815\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1815\" src=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA1.jpg\" alt=\"\" width=\"403\" height=\"242\" srcset=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA1.jpg 403w, http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA1-300x180.jpg 300w\" sizes=\"(max-width: 403px) 100vw, 403px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/image.slidesharecdn.com\/kaggleraddarslides2017-02-06-170307165813\/95\/tips-and-tricks-to-win-kaggle-data-science-competitions-1-638.jpg?cb=1488906029\"><u>https:\/\/image.slidesharecdn.com\/kaggleraddarslides2017-02-06-170307165813\/95\/tips-and-tricks-to-win-kaggle<\/u><\/a><\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/image.slidesharecdn.com\/kaggleraddarslides2017-02-06-170307165813\/95\/tips-and-tricks-to-win-kaggle-data-science-competitions-1-638.jpg?cb=1488906029\"><u>-data-science-competitions-1-638.jpg?cb=1488906029<\/u><\/a><\/p>\n<p>Currently, all online entertainment platforms collect data from their users to form user analytics and enhance the user experience quality. The type of content we stream is often what is recommended to us. This is no magic, but a recommendation system collecting data from millions of users and providing results based on demographics of the user, of the content viewed by them, etc. Such recommendations further develop our taste and preference, and the dynamic process continues. There is a recommendation system for every type of such big data where user taste matters! The type of news in our feed, the posts on Instagram, Spotify and Net\ufb02ix recommendations, etc are all the result of Machine Learning processes.<\/p>\n<p><strong>In this project I have created a simple recommendation system for a popular \u2018The Movie Database (TMDb)\u2019 of around 4800 movies. The dataset is hosted by Kaggle. Also, the tutorial followed for the project is hosted by Kaggle.<\/strong><\/p>\n<p><strong>\u00a0<\/strong>Algorithm Overview<\/p>\n<p>I have implemented a <strong>Content<\/strong> <strong>Based<\/strong> <strong>Recommendation<\/strong> <strong>System<\/strong>.<\/p>\n<p>In this type of the recommender system, the type of content of the movie is used to find other similar movies. For example if a viewer watches a romance movie, he\/she will be recommended movies of the \u2018romance\u2019 genre. If the movie has romance and comedy as the genr, the recommendations will be modified accordingly. If a viewer likes to watch movies of Leonardo DiCaprio, he\/she will often be recommended movies of the same actor. A pictorial representation of the system is as follows:<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/final-arpita-kabra\/fa2\/\" rel=\"attachment wp-att-1816\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1816 aligncenter\" src=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA2.jpg\" alt=\"\" width=\"380\" height=\"200\" srcset=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA2.jpg 380w, http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA2-300x158.jpg 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/a><br \/>\n<a href=\"https:\/\/www.ntt-review.jp\/archive_html\/200804\/images\/le1_fig02.gif\"><u>https:\/\/www.ntt-review.jp\/archive_html\/200804\/images\/le1_fig02.g<\/u>i<\/a><\/p>\n<h2>1)\u00a0\u00a0\u00a0 Data Cleaning and Modifying<\/h2>\n<p>Data Cleaning refers to removing unhelpful content from a given data. For example, while extracting relevant words from a movies overview, we must ignore articles, pronouns, conjunctions, connector words and other such words which do not provide useful information. This is performed by the feature \u2018stop word\u2019 of the sklearn library in Python specifically used for Natural Language Processing applications.<\/p>\n<p>Data modifying refers to extracting and adjusting the given array in an instance like list, dictionary or tuple according to our requirements and stripping off other unnecessary labels.<\/p>\n<h2>2)\u00a0\u00a0\u00a0 TFDIF Vectorizer<\/h2>\n<p>TFDIF stands for \u2018<strong>Term Frequency Inverse Document Frequency<\/strong>\u2019 is a feature to compute the importance of a given data in a similar bulk of data. The importance is represented as a normalized numerical quantity. More the number of occurrences of a given data, higher its TFIDF value. Thus, this statistical feature is applied to the overview of the movies. Relevant and useful words in the overview list are assigned a numerical weight based upon the total available data and the frequency of their occurence.<\/p>\n<p>Apart from this method, we also have a <strong>Count vectorizer <\/strong>available that does not normalise the result, thus not decreasing the worth of each feature!<\/p>\n<h2>3)\u00a0\u00a0\u00a0 Cosine Similarity<\/h2>\n<p>The cosine similarity approach measures the similarity between two given values. If they are similar, the algorithm results in a higher value of output. In our recommender system, we are using this algorithm to measure the similarity between overviews of the movie whose recommendations are to be generated along with other movies in the dataset based upon the TFDIF calculated previously. The cosine similarity formula is given as follows:<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/final-arpita-kabra\/fa3\/\" rel=\"attachment wp-att-1817\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1817\" src=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA3.jpg\" alt=\"\" width=\"428\" height=\"112\" srcset=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA3.jpg 428w, http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA3-300x79.jpg 300w\" sizes=\"(max-width: 428px) 100vw, 428px\" \/><\/a><\/p>\n<p>In the initial phase of the project, only the overview of movies is used to generate the recommendations. Later this has been extended to involve the genre and other movie features also.<\/p>\n<p><strong>The entire project has been implemented on Google Colab, that allows to execute Python Code along with sharing online. The viewing link for the same commented code is as follows:<\/strong><\/p>\n<p><a href=\"https:\/\/colab.research.google.com\/drive\/1RuXwBV9G67O4pldKVjemMetvcqhwD-Bp?usp=sharing\"><strong><u>Content based Movie Recommender System<\/u><\/strong><\/a><\/p>\n<h1>Results<\/h1>\n<p>Some of the interesting results generated by the recommendation system are as follows<\/p>\n<h2>1)<\/h2>\n<p style=\"text-align: center;\"><a href=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/final-arpita-kabra\/fa4\/\" rel=\"attachment wp-att-1818\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1818\" src=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA4.png\" alt=\"\" width=\"473\" height=\"213\" srcset=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA4.png 473w, http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA4-300x135.png 300w\" sizes=\"(max-width: 473px) 100vw, 473px\" \/><\/a><\/p>\n<h2>Using the overview of movies<\/h2>\n<p><strong>\u00a0<\/strong><strong>Using<\/strong> <strong>the<\/strong> <strong>genre<\/strong> <strong>and<\/strong> <strong>other<\/strong> <strong>features<\/strong> <strong>of movies<\/strong><\/p>\n<p><a href=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/final-arpita-kabra\/fa5\/\" rel=\"attachment wp-att-1819\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1819 aligncenter\" src=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA5.png\" alt=\"\" width=\"576\" height=\"210\" srcset=\"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA5.png 576w, http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-content\/uploads\/2021\/12\/FA5-300x109.png 300w\" sizes=\"(max-width: 576px) 100vw, 576px\" \/><\/a><\/p>\n<h1>Scope of Improvement<\/h1>\n<p>The recommendation system can be improved by adding more movies in the list along with taking more features for generating the recommendations. We can combine the Content-based filtering along with Collaborative filtering, in which other users also play a role in generating recommendations.<\/p>\n<h1>Correspondence with Digital Cultures<\/h1>\n<p>As part of the project I have tried to illustrate a small use of the multi-featured free tool Kaggle, which can be used in various domains pertaining to Data Science and its development. Also, the topic chosen for the project is a prototype of widely used complex algorithms to generate recommendations Our data is collected and interpreted at various levels for various purposes. Further, our large viewership develops demand for a certain type of content, which is taken note of by the creators. Such backend algorithms determine the development of digital content and culture at a large scale.<\/p>\n<h1>References<\/h1>\n<p><a href=\"https:\/\/www.kaggle.com\/tmdb\/tmdb-movie-metadata\"><u>https:\/\/www.kaggle.com\/tmdb\/tmdb-movie-metadata<\/u><\/a> <a href=\"https:\/\/www.kaggle.com\/ibtesama\/getting-started-with-a-movie-recommendation-system\/data\"><u>https:\/\/www.kaggle.com\/ibtesama\/getting-started-with-a-movie-recommendation-syst<\/u><\/a> <a href=\"https:\/\/www.kaggle.com\/ibtesama\/getting-started-with-a-movie-recommendation-system\/data\"><u>em\/data<\/u><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Content Based Movie Recommender using Kaggle Recommendation Systems are used in almost every form of available online entertainment platforms such as Youtube, Net\ufb02ix, Spotify, etc. The main aim is to enhance user experience by providing them better suggestions based on their activity, the content [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1814","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/pages\/1814"}],"collection":[{"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/comments?post=1814"}],"version-history":[{"count":3,"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/pages\/1814\/revisions"}],"predecessor-version":[{"id":1823,"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/pages\/1814\/revisions\/1823"}],"wp:attachment":[{"href":"http:\/\/sites.iitgn.ac.in\/digitalstudies\/wp-json\/wp\/v2\/media?parent=1814"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}