AlternatingLeastSquares

class implicit.als.AlternatingLeastSquares(factors=100, regularization=0.01, dtype=<type 'numpy.float32'>, use_native=True, use_cg=True, use_gpu=False, iterations=15, calculate_training_loss=False, num_threads=0, random_state=None)

Alternating Least Squares

A Recommendation Model based off the algorithms described in the paper ‘Collaborative Filtering for Implicit Feedback Datasets’ with performance optimizations described in ‘Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.’

Parameters:
  • factors (int, optional) – The number of latent factors to compute
  • regularization (float, optional) – The regularization factor to use
  • dtype (data-type, optional) – Specifies whether to generate 64 bit or 32 bit floating point factors
  • use_native (bool, optional) – Use native extensions to speed up model fitting
  • use_cg (bool, optional) – Use a faster Conjugate Gradient solver to calculate factors
  • use_gpu (bool, optional) – Fit on the GPU if available, default is to run on GPU only if available
  • iterations (int, optional) – The number of ALS iterations to use when fitting data
  • calculate_training_loss (bool, optional) – Whether to log out the training loss at each iteration
  • num_threads (int, optional) – The number of threads to use for fitting the model. This only applies for the native extensions. Specifying 0 means to default to the number of cores on the machine.
  • random_state (int, RandomState or None, optional) – The random state for seeding the initial item and user factors. Default is None.
item_factors

Array of latent factors for each item in the training set

Type:ndarray
user_factors

Array of latent factors for each user in the training set

Type:ndarray
explain(userid, user_items, itemid, user_weights=None, N=10)

Provides explanations for why the item is liked by the user.

Parameters:
  • userid (int) – The userid to explain recommendations for
  • user_items (csr_matrix) – Sparse matrix containing the liked items for the user
  • itemid (int) – The itemid to explain recommendations for
  • user_weights (ndarray, optional) – Precomputed Cholesky decomposition of the weighted user liked items. Useful for speeding up repeated calls to this function, this value is returned
  • N (int, optional) – The number of liked items to show the contribution for
Returns:

  • total_score (float) – The total predicted score for this user/item pair
  • top_contributions (list) – A list of the top N (itemid, score) contributions for this user/item pair
  • user_weights (ndarray) – A factorized representation of the user. Passing this in to future ‘explain’ calls will lead to noticeable speedups

fit(item_users, show_progress=True)

Factorizes the item_users matrix.

After calling this method, the members ‘user_factors’ and ‘item_factors’ will be initialized with a latent factor model of the input data.

The item_users matrix does double duty here. It defines which items are liked by which users (P_iu in the original paper), as well as how much confidence we have that the user liked the item (C_iu).

The negative items are implicitly defined: This code assumes that positive items in the item_users matrix means that the user liked the item. The negatives are left unset in this sparse matrix: the library will assume that means Piu = 0 and Ciu = 1 for all these items. Negative items can also be passed with a higher confidence value by passing a negative value, indicating that the user disliked the item.

Parameters:
  • item_users (csr_matrix) – Matrix of confidences for the liked items. This matrix should be a csr_matrix where the rows of the matrix are the item, the columns are the users that liked that item, and the value is the confidence that the user liked the item.
  • show_progress (bool, optional) – Whether to show a progress bar during fitting
rank_items()

Rank given items for a user and returns sorted item list.

Parameters:
  • userid (int) – The userid to calculate recommendations for
  • user_items (csr_matrix) – A sparse matrix of shape (number_users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially calculate the best items for this user.
  • selected_items (List of itemids) –
  • recalculate_user (bool, optional) – When true, don’t rely on stored user state and instead recalculate from the passed in user_items
Returns:

List of (itemid, score) tuples. it only contains items that appears in input parameter selected_items

Return type:

list

recommend()

Recommends items for a user

Calculates the N best recommendations for a user, and returns a list of itemids, score.

Parameters:
  • userid (int) – The userid to calculate recommendations for
  • user_items (csr_matrix) – A sparse matrix of shape (number_users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially calculate the best items for this user.
  • N (int, optional) – The number of results to return
  • filter_already_liked_items (bool, optional) – When true, don’t return items present in the training set that were rated by the specificed user.
  • filter_items (sequence of ints, optional) – List of extra item ids to filter out from the output
  • recalculate_user (bool, optional) – When true, don’t rely on stored user state and instead recalculate from the passed in user_items
Returns:

List of (itemid, score) tuples

Return type:

list

recommend_all()

Recommends items for all users

Calculates the N best recommendations for all users, and returns numpy ndarray of shape (number_users, N) with item’s ids in reversed probability order

Parameters:
  • self (implicit.als.AlternatingLeastSquares) – The fitted recommendation model
  • user_items (csr_matrix) – A sparse matrix of shape (number_users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially calculate the best items for this user.
  • N (int, optional) – The number of results to return
  • recalculate_user (bool, optional) – When true, don’t rely on stored user state and instead recalculate from the passed in user_items
  • filter_already_liked_items (bool, optional) – This is used to filter out items that have already been liked from the user_items
  • filter_items (list, optional) – List of item id’s to exclude from recommendations for all users
  • num_threads (int, optional) – The number of threads to use for sorting scores in parallel by users. Default is number of cores on machine
  • show_progress (bool, optional) – Whether to show a progress bar
  • batch_size (int, optional) – To optimise memory usage while matrix multiplication, users are separated into groups and scored iteratively. By default batch_size == num_threads * 100
  • users_items_offset (int, optional) – Allow to pass a slice of user_items matrix to split calculations
Returns:

Array of (number_users, N) with item’s ids in descending probability order

Return type:

numpy ndarray

similar_items()

Calculates a list of similar items

Parameters:
  • itemid (int) – The row id of the item to retrieve similar items for
  • N (int, optional) – The number of similar items to return
Returns:

List of (itemid, score) tuples

Return type:

list

similar_users()

Calculates a list of similar users

Parameters:
  • userid (int) – The row id of the user to retrieve similar users for
  • N (int, optional) – The number of similar users to return
Returns:

List of (userid, score) tuples

Return type:

list