pearson algorithm for finding similarity

we hav used pearson algorithm which is given below. Ideally the algorithm should return values between -1 and 1 but since we hav a large set of data therefore it is giving values such as 1.0327955589886444 and 1.1547005383792517 so can u suggest us some solution to this problem or any other efficient algorithm for finding similarity between users.

the input file is a dictionary of users, their choices and ranking.
eg: dict1={user1:{choice1:rank1,choice2:rank1},user2:{choice1:rank1,choice2:rank1}, user3:{choice1:rank1,choice2:rank1}}
but we are working with a very large dictionary consisting of varying ranking.

def pearson(prefs,p1,p2):
# Get the list of mutually rated items
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# if they are no ratings in common, return 0
if len(si)==0: return
# Sum calculations
# Sums of all the preferences
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# Sums of the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# Sum of the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# Calculate r (Pearson score)
if den==0: return 0
return r


Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


In this Discussion