Recommender algorithm in Mathematica

This post is a bit geeky but never mind!
In our Lab we are not defending a particular tool or method to code or to analyze data. We stick on the simple strategy: chose the tool fitting best the task! Ok, sometimes you don’t know in advance what the best tool is but with time you get experienced and you will have a pretty high hit rate to select the best tool for your  purposes. Our experiences lead to the following preferences:

  • General data analysis: R, Python, and tools like awk, sed.
  • Fast prototyping and simulations: R, Python
  • Mathematical analysis for differential equations etc. : Mathematica
  • Performance related stuff: C, C++ (yes). One team-member still favors FORTRAN🙂
  • Databases: MySQL and some NoSQL

If you are not interested in technical details, then here is the message to take from this post:

Don’t try doing everything with the same method, programming language, algorithms, and tools. Be flexible, figure out what is the most efficient method to solve a particular task! Don’t be addicted.

So far, so good.  For our recommender research tasks we mainly use Python. This is comfort-zone. Sometimes it’s good going out of comfort-zone, right? For fun I decided to implement a simple recommender algorithm (B-Rank) in Mathematica. This is definitely out of comfort-zone, since for this kind of stuff we already have our best option: Python with Numpy and Scipy. We are happy in terms of performance and scalability. The code translation from Python to Mathematica was straight forward. However, the Mathematica Code reads a bit cryptic if one is not used to it.  I was wondering, how the performance of this implementation compares to the Python version. The difference is striking.

I used the well known “movielens” data set for testing. The data set contains 946 people, 1682 movies, and 100’000 ratings in total. B-Rank algorithm is based on a random walk on a bipartite graph. For details check the Paper. Both Scipy (Python) and Mathematica provide sparse data structures. This is crucial in terms of performance. In both cases – Mathematica and Python – I used straight forward implementation of the provided structures and routines.
Here we go with the Mathematica stuff:

dir = "/Users/blattner/LWS/team-codes/data/movielens";
SetDirectory[dir];
alpha = 0.3;
user = 1;

Here we just set the working directory and set a parameter (alpha) equal to 0.3. Alpha tunes the diversity in the recommendation results. For this showcase we focus on a particular user (user=1).

Clear[a, ind, vals, m]
a = Import["u.data", "Table"];

Next we import the data in table form. Then we have to construct a sparse matrix. This is needed because most of the entries (movie ratings) are EMPTY.

vals = #[[3]] &; /@ a;
ind = {#[[1]], #[[2]]} &; /@ a ;
m = SparseArray[ind -> vals] ;
dims = Dimensions[m]

Now we have the matrix m with dimensions (943×1682) and saved the dimensions of the matrix for later purposes. Before starting with the recommendation calculations let’s check some simple statistics. First we check the distribution of how many movies were rated by the users.

Histogram[Total[Sign[Transpose[m]], 1], AxesLabel -> {TotalVotes, Frequency}]

The output looks like

Histogram of rated movies per user.

This distribution looks pretty fat-tailed and is what we expect from earlier analysis. Now we do the same for the attendance distribution (how many times a movie has been watched).

Histogram[Total[Sign[m], 1], AxesLabel -> {Attendance, Frequency}]

And the output

Histogram of attendance frequency per movie.

As well highly fat-tailed distribution. Finally we inspect the distribution of ratings given by users. The scale is from 1 to 5.

ind = Join @@ Drop[Map[First, ArrayRules[m[[#, All]]]], -1] &; /@ Range[Dimensions[m][[1]]];
mm = Mean @ m[[#, ind[[#]]]] &; /@ Range[Dimensions[m][[1]]];
Histogram[mm, AxesLabel -> {MeanVote (User), Frequency}]

And again the output:

Histogram of ratings (100’000 in total).

The distribution is more symmetric and centered around 3.7. OK, so far no rocket science and it’s not worth to tabular the performance till now, since everything went smoothly. Now the fun part starts in constructing a propagator matrix used to propagate the signal on the bipartite graph. The code is straight forward in Mathematica.

m = Transpose[m]
dims = Dimensions[m]
(*calculate inverse diagonal degree matrix first*)
dinv = SparseArray[Band[{1, 1}] -> 1/Total[Sign[m]], {dims[[2]], dims[[2]]}];
(*rescale votes*)
mnorm = m/5;
(*generate diagonal weight matrix*)
w = Total[mnorm]/Total[Sign[m]];
W = SparseArray[Band[{1, 1}] -> w, {dims[[2]], dims[[2]]}];
(*calculate intermediate matrix*)
B = Dot[Sign[m], Dot[W, Dot[dinv, Transpose[Sign[m]]]]];
(*remove diagonal to avoid back propagation to source*)
B = ReplacePart[Normal[B], {i_, i_} -> 0];
kinv = SparseArray[Band[{1, 1}] -> 1/Total[B], {dims[[1]], dims[[1]]}];
(*finally calculate the propagator*)
P = Dot[kinv, B];
m = Transpose[m];
(*get indices with votes >= 3*)
in = Flatten[Position[Normal[m[[user, All]]], _?(# >= 3 &;)]];
(*generate indicator vector for propgation*)
P = SparseArray[P];

dinv (line 4) and W (line 9) are diagonal matrices and their construction is equally fast in Python and Mathematica. However, troubles started in line 11. B is an intermediate matrix needed to construct the final propagator matrix P. The statement in line 11 consists of three matrix multiplications. This takes ages compared to the faster-then-light Python implementation. In detail: the first part of line 11 (first two matrix multiplications) were somehow comparable:

Dot[W, Dot[dinv, Transpose[Sign[m]]]] (line 11)

Mathematica: 0.047 s
Python :           0.010 s
That’s ok but already here we observe a difference.

The second part is the real killer:

B=Dot[Sign[m],Dot[W,Dot[dinv,Transpose[Sign[m]]]]] (line 11)

Mathematica: 20.57 s
Python :               0.11 s

Gotcha! That’s incredible. Again, I used just straight forward structures and routines provided for sparse data structures in both ‘languages’. I don’t think that I applied stuff so strangely that this can happen. I am really surprised. Let’s move on.
Next, we have to calculate the final propagator:

B = ReplacePart[Normal[B], {i_, i_} :> 0];
kinv = SparseArray[Band[{1, 1}] -> 1/Total[B], {dims[[1]], dims[[1]]}];
(*finally calculate the propagator*)
P = Dot[kinv, B];
m = Transpose[m];
(*get indices with votes >= 3*)
in = Flatten[Position[Normal[m[[user, All]]], _?(# >= 3 &;)]];
(*generate indicator vector for propgation*)
P = SparseArray[P];

Again, lines are numbered. The crucial part is line 4; matrix multiplication.

Mathematica: 6.2 s
Python:            0.04 s

This difference is striking too! In line 7 we extract the indices of already voted movies by user 1. Now we are left with the recommendations. In Mathematica I did this just for one user (user 1), whereas in Python I calculated the recommendations for all users at once! This makes the result even more surprising. Here we go with the code for Mathematica:

fp = Plus @@ # &; /@ P[[All, in]] // N ;(*forward prop.*)
Pt = Transpose[P];
bp = Plus @@ # &; /@ Pt[[All, in]] // N; (*backward prop.*)
rk = fp^(alpha) * bp ^(1 - alpha);

and

rk = fp^(alpha) * bp ^(1 - alpha);
rk[[in]] = 0;
ord = Reverse[Ordering[rk]];

fp is the forward propagation ranking and bp is the backward propagation ranking. The final recommendations consists multiplying both vectors element-wise to get the final ranking ord which consists the indices of the suggested movies. Mathematica is even incredible slow in this case. The timing for the two blocks above together:

Mathematica: 22.0 s (for one user!) 
Python          :    0.8 s (for all 943 users!) 

The statement taking so much time in Mathematica is the calculation of bp (the backward propagation based recommendations).  And there is no ordering problem – I checked with transposing everything and ended up with the same results.

So, whats the lesson to learn here? Don’t try doing everything with the same method, programming language, algorithms, and tools. Be flexible, figure out what solves you problems most efficiently and use it!

Don’t be addicted.
Below the python code:

'''
Note: this code is not commented and was used
only for comparison found in our blog-post.

You need scipy, numpy, time and csv modules installed

The routines 'assume' to deal with movielens
data.

A:       movielens rating matrix (lil_matrix scipy)
alpha:   diversity tuning for B-Rank
thr:     threshold for positive votes
vrange:  normalizer (5.0 in the case of movielens)

16/04/12 marcel.blattner@ffhs.ch

'''

import numpy as np
from scipy import sparse as sc
import csv
from time import clock

class recpy():
    def __init__(self, alpha=0.3,thr=5.0,vrange=5.0):

            self.alpha=0.3
            self.thr = 3.0
            self.vrange= 5.0

    def gen_matrix_from_csv(self,**kwargs):

            usrcol = kwargs['usercol']
            objcol = kwargs['objectcol']
            ratcol = kwargs['ratingcol']
            csvfile = kwargs['file']

            #check for format
            csvfile = open(csvfile, "rb")
            dialect = csv.Sniffer().sniff(csvfile.read(1024))
            csvfile.seek(0)
            #initiate reader
            readers = csv.reader(csvfile, dialect)

            #user conversion
            convuser = dict()
            convitem = dict()

            z1 = 0
            z2 = 0

            for row in readers:

                    if not (convuser.has_key(int(row[usrcol]))):
                                convuser[int(row[usrcol])]=z1
                                z1 = z1 + 1
                    if not (convitem.has_key(int(row[objcol]))):
                                convitem[int(row[objcol])]=z2
                                z2 = z2 + 1

            A = sc.lil_matrix((len(convuser),len(convitem)))
            print "Matrix Dimension: ", A.shape

            csvfile.seek(0)
            #initiate reader
            readers = csv.reader(csvfile, dialect)

            for row in readers:
                    usr_col = convuser.get(int(row[usrcol]))
                    obj_col = convitem.get(int(row[objcol]))
                    if ratcol is -1:
                            A[usr_col,obj_col]=1
                    else:
                            A[usr_col,obj_col]=float(row[ratcol])

            return A, convuser,convitem

    def gen_stoc_matrix_brank(self,A):
            print "Convert to coo format: "
            A = A.tocoo()
            A = A.T
            mask = A.data > 0
            row = A.row[mask]
            col = A.col[mask]
            print "create Dinv Matrix: "
            H = sc.coo_matrix((np.ones(row.size),(row,col)),shape=(A.shape[0],A.shape[1]))
            s=np.array(H.sum(axis=0))[0]
            sind = (s==0)
            s[sind]=0.00001
            sinv = 1./s
            Dinv = sc.dia_matrix((sinv,0),(A.shape[1],A.shape[1]))
            print "convert to csc format: "
            Dinv.tocsc()

            #normalization
            print "create weight matrix: "
            Vnorm = A/self.vrange
            del A #A is not needed anymore
            w = Vnorm.sum(axis=0)/s
            w = np.array(w)[0]

            W = sc.dia_matrix((w,0),(w.size,w.size))
            W = W.tocsc()

            H = sc.csc_matrix(H)

            B = np.dot(H,np.dot(W,np.dot(Dinv,H.transpose())))
            del H
            #del W
            del Dinv
            #strategy: first create dia_matrix then substract
            #remove diagonal
            print "create diagonal matrix: "
            D = sc.dia_matrix((B.diagonal(),0),(B.shape[1],B.shape[1]))
            D.tocsc()
            B = B.tocsc()
            B = B - D
            s = B.sum(axis=0)
            s = np.array(s)[0]
            sind = (s==0)
            s[sind]=0.00001
            s = 1./s
            Kinv = sc.dia_matrix((s,0),(B.shape[0],B.shape[0]))

            Kinv = Kinv.tocsc()
            print "calculate propagator matrix: "
            P = np.dot(Kinv,B)

            return P

    def get_pred_list(self,RANKING,lst_length):
            #RANKING format: userxobject
            RANKING = np.array(RANKING.todense())
            ARG = RANKING.argsort()
            ARG = np.fliplr(ARG)
            LST_IND = ARG[:,0:(lst_length)]
            s = np.arange(RANKING.shape[0]).repeat(lst_length,axis=None)
            LST = sc.coo_matrix((np.ones(LST_IND.shape[0]*LST_IND.shape[1]),(s,LST_IND.flatten())),shape=(RANKING.shape[0],RANKING.shape[1]))
            LST = LST.tocsc()
            return LST_IND, LST

    def b_rank_optimized(self,A):
            st = clock()
            A = A.tocsr()
            print "Crunching the PropagationMatrix :"
            P = self.gen_stoc_matrix_brank(A)

            #A should be a lil matrix
            #copy to generate indicator matrix (is of format lil)

            print "Convert matrix to handy coo-format: "
            A = A.tocoo()

            mask = A.data >= self.thr
            col = A.col[mask]
            row = A.row[mask]
            #instead of doing something H[row,col]=0 it is faster to construct an
            #indicator matrix K and to multiply element by element
            K = sc.coo_matrix((np.ones(row.size),(row,col)),shape=(A.shape[0],A.shape[1]))
            H = A.multiply(K)
            H.data = H.data/H.data
            del K # K is not needed anymore
            H = H.tocsr()
            #create dict with indices to predict
            print "Calculate forward propagation :"
            FW = np.dot(P.transpose(),H.transpose())
            print "Calculate backward propagation :"
            FB = np.dot(P,H.transpose())
            print "Calculate final ranking :"
            del H
            #not needed anymor
            FW.data = FW.data**self.alpha
            FB.data = FB.data**(1 -self.alpha)
            RANKING = FW.multiply(FB)

            #Setting already voted objects to zero in ranking-matrix
            A = A.tocsc()
            row, col = A.nonzero()
            K = sc.coo_matrix((np.ones(row.size),(row,col)),shape=(A.shape[0],A.shape[1]))
            RANKING = RANKING - RANKING.multiply(K.transpose())

            en = clock()
            print "Stuff done in: %3.6f seconds" % float(en-st)
            return RANKING.T

    def main():
        import rec_py as c
        rc = c.recpy()
        print "Start reading data......"
        A,n,nn = rc.gen_matrix_from_csv(usercol=0,file="./u.data",objectcol=1,ratingcol=2)
        print "Data read....."
        print "--------------"
        print " "
        print "Start awesome calculations......"
        print " "
        P = rc.gen_stoc_matrix_brank(A)
        RANK = rc.b_rank_optimized(A)
        PRED,nil = rc.get_pred_list(RANK,20)
        print " "
        print "Predicitons for user 1 (just as an example): "
        print " "
        print PRED[0,:]

    if __name__== "__main__":
        main()


About Blattner

Head Laboratory for Web Science.
This entry was posted in algos, data. Bookmark the permalink.

3 Responses to Recommender algorithm in Mathematica

  1. Andrew Moylan says:

    Hi Marcel, an interesting post, thanks.

    However your timing comparison between Python and Mathematica isn’t at all fair. This is because your Mathematica code is using exact numbers (integers and fractions) at a couple of places where Python would not.

    In particular after your line 9 defining W, here is one of the elements of W:

    In[174]:= Max[W]
    Out[174]= 112/115

    Actually all of the elements of W are infinite precision integers or fractions and Mathematica will work with them in this form as long as possible, whereas in Python you would be using fast inexact (machine precision) numbers at this stage.

    If you insert this line after line 9:

    W = N[W];

    then W becomes a sparse array of fast inexact numbers and you get a much fairer comparison against Python.

  2. Andrew Moylan says:

    The next largest inefficiency is in line 13 in which you zero the diagonal of B:

    B = ReplacePart[Normal[B], {i_, i_} :> 0];

    Here Normal[B] converts B from a sparse array into a full dense array, which is slow (and presumably not is done in the Python code). Instead you could compute this in a sparse way:

    B *= SparseArray[{i_, i_} -> 0, Dimensions[B], 1]

    I find this is about 30 to 40 times faster.

    • Blattner says:

      Hey Andrew

      Thanks a lot for your answer and suggestions. Yes, you are absolutely right. Your suggestions make things much better. After incorporating your comments, I end up with the following code:

      Timing[Module[{}, m = Transpose[m];
        dims = Dimensions[m]; 
        dinv = SparseArray[
          Band[{1, 1}] -> 1/Total[Sign[m]], {dims[[2]], dims[[2]]}]; 
        mnorm = m/5; w = Total[mnorm]/Total[Sign[m]]; 
        W = SparseArray[Band[{1, 1}] -> w, {dims[[2]], dims[[2]]}]; 
        W = N[W]; B = Dot[Sign[m], Dot[W, Dot[dinv, Transpose[Sign[m]]]]]; 
        B = B*SparseArray[{i_, i_} -> 0, Dimensions[B], 1]; 
        kinv = SparseArray[
          Band[{1, 1}] -> 1/Total[B], {dims[[1]], dims[[1]]}]; 
        P = Dot[kinv, B];]]
      

      and

      Timing[Module[{}, m = Transpose[m]; 
        in = Position[Normal[m], _?(# >= 3 &)]; 
        H = SparseArray[in -> Table[1, {Length[in]}]]; P = SparseArray[P]; 
        fp = Dot[Transpose[P], Transpose[H]]; bp = Dot[P, Transpose[H]]; 
        rk = fp^(alpha) * bp ^(1 - alpha);]]
      

      This makes things indeed faster.
      Mathematica: 4.7 s
      Python : 1.2 s

      Now, I am happy!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s