pearson algorithm for finding similarity - Programmers Heaven

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

# pearson algorithm for finding similarity

Posts: 1Member
we hav used pearson algorithm which is given below. Ideally the algorithm should return values between -1 and 1 but since we hav a large set of data therefore it is giving values such as 1.0327955589886444 and 1.1547005383792517 so can u suggest us some solution to this problem or any other efficient algorithm for finding similarity between users.

the input file is a dictionary of users, their choices and ranking.
eg: dict1={user1:{choice1:rank1,choice2:rank1},user2:{choice1:rank1,choice2:rank1}, user3:{choice1:rank1,choice2:rank1}}
but we are working with a very large dictionary consisting of varying ranking.

def pearson(prefs,p1,p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# if they are no ratings in common, return 0
if len(si)==0: return
# Sum calculations
n=len(si)
# Sums of all the preferences
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# Sums of the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# Sum of the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/n)
den=math.sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r