Why does Netflix know which movies you like?
An introduction to academia starting with “Why?”
After making a list of questions which would pique anyone’s interest, we asked faculty members on campus who might be knowledgeable in these areas to answer them from the perspective of their respective expertise. Let’s take a look into the world of research through questions that you feel you know something about, but cannot answer definitively when actually asked.
Q6. Why does Netflix know which movies you like?I am always amazed when Netflix and other subscription-based streaming services recommend movies that I like. How do they do it?
Professor, Graduate School of Information Science and Technology
Describing and solving real-world problems as mathematical models
I study generic algorithms (computational procedures) for mathematical optimization. Simply speaking, optimization is finding the best possible means of achieving an objective. The field originally emerged from thinking about how to inflict maximum damage on the enemy using limited weapons and soldiers during war. We isolate issues we want to solve in society, such as those related to building structure, computer shogi, setting up evacuation sites, inventory management, crane control and assigning students to laboratories, and then describe them as mathematical models by preparing suitable functions such as linear and quadratic functions. My research is to figure out how quickly such functions can be solved. To achieve this, it is necessary to create an algorithm that requires as little computation complexity as possible.
Optimization problems can broadly be classified into discrete and continuous optimization. For example, the task of finding the shortest route from Hongo-sanchome Station to Komaba-todaimae Station in the shortest time is a discrete optimization problem. On the other hand, the problem of whether a computer can automatically recognize handwritten characters by machine learning is a continuous optimization problem. They are solved in different ways with different tools.
The Netflix recommendation system utilizes continuous optimization. Suppose we have movie rating information from each user (a well-known dataset is MovieLens). Assuming that people with similar ratings have similar tastes, if there is a movie A that a particular user has not seen yet, Netflix recommends it by applying the rating given to A by people with similar tastes to that user. This is how recommendation works.
In technical terms, this is a missing value imputation problem using non-negative matrix factorization (a non-negative matrix is a matrix with all values greater than or equal to zero). A matrix containing the rating values by each user and for each movie is replaced by a product of two thin matrices. Once U and V are determined (in the equation below) by solving the optimization problem, the missing values can be displayed. If there are latent factors (preference patterns) such as “I like science fiction,” “I like Kurosawa,” and “I like romance,” and if they have a close relationship with both the user and the movie, it can be assumed that they rate the movies highly, and they are expected to like a movie they have not seen yet. However, the system does not need to determine what the latent factors are; it only needs the data.
The recommendation systems of video streaming services enjoyed by many are supported by the results of this kind of mathematical optimization research.
* This article was originally printed in Tansei 45 (Japanese language only). All information in this article is as of September 2022.