Bangda Sun

Practice makes perfect

Recommender Systems Overview - From A Survey

[Paper reading]: Towards the Next Generation of Recommender Systems - A Survey of the State-of-the-Art and Possible Extensions.

This is an old paper (2005), therefore the “State-of-the-Art” is actually not very true. But it provides a complete overview of the field of recommender systems, mainly about recommendation methods (more classical, since deep learnings are not very popular at that time), as well as some extensions including better methods for representing user behavior and item information , more advanced modeling methods, incorporation of various contextual informations, etc.

1. The Survey of Recommender Systems

Recommender systems emerged as an independent research area in the mid-1990s when researchers started focusing on recommendation problems that explicitly rely on the ratings structures. In most common formulation, recommendation is reduced to estimating ratings for items that have not been seen by users, once the estimations are generated, users will be recommended the items with the highest estimated ratings.

More formally, the recommendation problem can be formulated as follows: let \(C\) be the set of all users and let \(S\) be the set of all items can be recommended, let \(u\) be a utility function that measures the usefulness of item \(s\) to user \(c\):

\[
u: C \times S \rightarrow R,
\]

where \(R\) is a ordered set, i.e. the recommended list. Then for each user \(c\in C\), choose item \(s’\in S\) such that it could maximize the utility,

\[
s’_{c} = \arg\max_{s\in S} u(c, s), \forall c\in C.
\]

Each element of the user space \(C\) can be defined with a profile that includes various user characteristics, such as age, gender, income, martial status, etc. Similarly, each element of the item space \(S\) can be defined with a set of characteristics. For example, in movie recommender system, it could be movie title, genre, director, year of release, leading actors, etc.

However, utility function \(u\) is usually not defined on the whole \(C\times S\) space, but only on some subset of it. In recommender systems, utility is typically represented by ratings and is initially defined only on the items that were previouslyy rated by users. Therefore, recommendation is actually the process of extrapolating unknown ratings, this is usually done by:

  • specifying heuristics that define the utility function and empirically validating its performance.
  • estimating the utility function that optimizes certain performance criterion, such as MSE.

Recommender systems are usually classified into the following categories based on how recommendations are made:

  • content-based recommendations: the user will be recommended items similar to the ones the user preferred in the past.
  • collaborative-filtering recommendations: the user will be recommended items that people with similar tastes and preferences liked in the past.
  • hybrid recommendations: these methods combine content-based and collaborative-filtering methods.

Or following categories based on recommendation methods (for rating estimation):

  • heuristic-based recommendations
  • model-based recommendations

In addition to rating-based recommender systems, there are also recommender systems that predict the relative preferences of users, etc.

1.1 Content-Based Methods

The utility function \(u(c, s)\) is estimated based on the utilities \(u(c, s_{i})\) assigned by user \(c\) to items \(s_{i}\in S\) that are similar to item \(s\). Therefore, only the items that have high degree of similarity to whatever the user’s preferences are would be recommended.

Because of the significant and early advancements made by the Information Retrieval and Filtering communities and the importance of several text-based applications, many content-based recommender systems focus on recommending items containing textual information, including documents, websites (URLs) and Usenet news messages.

More formally, let \(Content(s)\) be an item profile, i.e. a set of attributes characterizing item \(s\). It is usually computed by extracting a set of features from item \(s\), described as keywords. The importance of keywords \(k_{j}\) in documents \(d_{i}\) is determined with some weighs \(w_{ij}\). One of the most widely used measures for specifying keyword weights in Information Retrieval is TF-IDF. Suppose \(N\) is the total number of documents, and \(n_{i}\) of them contain keywords \(k_{j}\), and \(f_{ij}\) is the count of \(k_{j}\) in docment \(d_{i}\), then TF (Term-Frequency or normalized Term-Frequency) is defined as

\[
TF_{ij} = \frac{f_{ij}}{\max_{z}f_{zj}},
\]

where \(z\) is across all documents. The core idea is: keywords that appear in many documents are not useful in distinguishing between a relevant document and a non-relevant one. Inverse-Document-Frequency for keyword \(k_{j}\) is defined as

\[
IDF_{ij} = \log\frac{N}{n_{i}},
\]

then the overall TF-IDF value is

\[
w_{ij} = TF_{ij}\times IDF_{ij}.
\]

And the content of document \(d_{i}\) is defined as \(Content(d_{i}) = (w_{1i}, w_{2i}, \cdots, w_{ki})\).

In content-based recommender system, candidate items \(Content(s)\) are compared with items rated by user (a.k.a. the profile of user \(ContentBasedProfile(c)\)) and best matching items are recommended, i.e. calculating the utility function

\[
u(c, s) = score\left( ContentBasedProfile(c), Content(s) \right),
\]

there are plenty choices for the utility function, for example cosine-similarity, pearson-correlation, etc.

There are also several disadvantages for content-based recommendation:

  • Limited content analysis. Content-based recommendations are explicitly associated with the “keywords”, it would be hard to extract features for some domains, such as images, audio and video.
  • Over-specialization. The recommendation can only recommend items that highly correlated with users’ past preferences and have bad diversity. In some cases, the items should not be recommended if they are too similar to users already seen.
  • Cold start problem. New users don’t have sufficient rated items therefore it’s hard to use this approach.

1.2 Collaborative-Filtering Methods

Compared with content-based recommendation, collaborative-filtering could deal with any kind of content and recommend any items, event the ones that are dissimilar to those seen in the past.

For user-based collaborative-filtering, the utility function \(u(c, s)\) is estimated based on \(u(c_{j}, s)\) assigned to item \(s\) by other users \(c_{j}\in C\) who are similar to user \(c\).

There are two types of collaborative-filtering methods:

  • Memory-based (Heuristic-based)

Essentially it generate predictions \(r_{c, s}\) (ratings) based on the ratings of item \(s\) from a neighbor \(\hat{C}\) of user \(c\), a typical method is weighted average with weights to be user similarities:

\[
r_{c, s} = \frac{\sum_{c’\in \hat{C}} sim(c, c’)\cdot r_{c’, s}}{\sum_{c’\in \hat{C}} sim(c, c’)},
\]

there are some variants such as normalize \(r_{c’, s}\), etc. One problem with using the weighted average is it doesn’t take into account the fact that different users may have different rating scales.

Both content-based recommendation and user-based collaborative-filtering use similarity measures, but in content-based recommendation it is used to measure the similarities between user profile and item content; in user-based collaborative-filtering it is used to measure user ratings on items.

There is another method called item-based collaborative-filtering, the similarities are measured among items rather than users. It could provide better computational performance than user-based, and even provides better results.

  • Model-based

There have been many model-based collaborative-filtering proposed in the literature, including statistical models (both frequentist and bayesian), probabilistic relational model (view recommendation process as sequential decision problem and use Markov decision process; probabilistic LSA), linear regression, logistic regression, etc. In conclusion, model-based collaborative-filtering calculates utilities based on model learned from underlying data using statistical and machine learning techniques.

There are also several disadvantages for collaborative-filtering recommendation:

  • Cold start problem. This is same with content-based recommendation.
  • Sparsity. The number of ratings already obtained is usually very small compared to the number ratings need to be predicted. One way to overcome the problem is to use user profile information when calculating similarities. Dimension reduction (model-based) approaches such as SVD is also an alternative.

1.3 Hybrid Methods

Combining content-based recommendation and collaborative-filtering recommendation into hybrid recommendation could help to prevent certain limitations:

  • implementing two recommendation separately and combining their predictions

Using weighted average or majority vote on predictions.

  • incorporating some content-based characteristics into a collaborative approach

Maintain the content-based profiles for each user to calculating similarities (among users), this could overcome some sparsity issue, and recommend items to user that highly match user profiles.

  • incorporating some collaborative characteristics into a content-based approach

The most popular approach is to use some dimension reduction techniques on content-based profiles.

  • constructing a general unifying model that incorporates both content-based and collaborative characteristics

There are many approaches: rule-based classifier, probabilistic LSA, bayesian models. One of the approachesr is a statistical model uses profile information of user and item to estimates unknown rating \(r_{ij}\) for user \(i\) and item \(j\):

\[
r_{ij} = x_{ij}\mu + z_{i}\gamma_{j} + w_{j}\lambda_{i} + e_{ij},
\]

where \(\gamma_{j}\sim N(0, \Gamma)\) is unobserved item heterogeneity, \(\lambda_{i}\sim N(0, \Lambda)\) is unobserved user heterogeneity, \(e_{ij}\sim N(0, \sigma^{2})\) is the noise. The parameters are estimated from the data (known ratings) using MCMC.

2. Extending Capabilities of Recommender Systems

Recommender systems described above could be improved in multiple ways.

2.1 Comprehensive Understanding of Users and Items

Most recommender systems that estimate ratings don’t take full advantage of information of users’ transactional histories and other available data, such as keywords and simple user demographics, which could be useful. In addition, more advanced profiling techniques based on data mining rules, sequences and signatures that describe users’ interests can be used to build user profiles. While these techniques are already used in the context of Web usage analysis, i.e. discover the navigational Web usage patterns of users in order to provide better Web site recommendations.

2.2 Extensions for Model-Based Recommendations

Some of the model-based approaches provide rigorous rating estimation methods utilizing various statistical and machine learning techniques. Other areas of mathematics and computer science, such as approximation theory, can also contribute to develop better rating estimation.

2.3 Multidimensionality of Recommendations

The current generation of recommender systems operates in the two-dimensional User and Item space, they don’t take into consideration additional contextual information that may be crucial in some applications, such as time, place, company of user, etc.

On the other hand, many two-dimensional recommendation approaches cannot be directly extended to multidimensional case. In this case, selecting only the ratings relevant to a recommendation context is an alternative.

2.4 Multi-criteria Ratings

Ratings are not always the only consideration in recommendation. This involves some advanced optimization techniques.

2.5 Nonintrusiveness

Many recommender systems are intrusive in the sense that they require explicit feedback from the user and often at a significant level of user engagements, while some recommender systems use nonintrusive rating determination methods where certain proxies are used to estimate the real ratings. In these cases, minimizing intrusiveness while maintaining certain levels of accuracy of recommendations needs to be addressed, such as explore the optimal number of ratings the system should ask from new users.

2.6 Effectiveness of Recommendations

In most of recommender systems, the performance evaluation is done in terms of coverage and accuracy. One limitation is that these measures are typically performed on test data that the users chose to rate. In other words, the empirical evaluation results only show how good the system is on items the user decide to rate. Therefore, it is also important to develop economics-oriented measures that capture the business value of recommendations.