Basics of Cosine Similarity

Last few decades, Big Company like F.A.N.G (Facebook, Amazon, Netflix, Google), YouTube and many other social or commercial services recommendation system take more place in our lives. Instead of having to browse thousands of boxes sets and movie titles, Search “Toy Story” also show us “Toy story2, Toy Story3 etc.”, or search a place, shows the restaurant, historical place based on your search place. Netflix organized a challenge with 1-million-dollar prize, where the goal was to create a recommendation system, that perform better than its own algorithm.
 

In this article, I will explain how simple Cosine function used in recommendation system. Also, can use Euclidean Distance, Manhattan Distance, advanced algorithm etc. to measure similarity
 

What is Cosine Similarity?
 

Cosine similarity measures the cosine of the angle between two non-zero vectors of an inner product space.  Similarity measurement concerned with orientation. If two cosine vectors aligned in same orientation than similarity measurement of 1. If two vectors are perpendicular or diametrically opposite then it will be 0 and -1.
 

Relation Between Cosine Similarity and Cosine Distance for recommendation:
 

Cosine similarity and distance has opposite relation. When the similarity decreases, distance of two vector increases.
Similarity increases, distance of two vector decreases
 

1 – Cosine Similarity = Cosine Distance
Cosine Similarity = Cos Θ (Θ is the angle between a and b)

But why?
 


Cosine distance measures the angular difference between two vectors. We want similar vectors to be close and another thing is that distance always should be positive.

That’s why it is, (1 – Cosine Similarity) = Cosine Distance


Bag of Word Model:

Let’s, Take Some Document

D1: Mary loves Movies, Cinema and Arts

D2: John went to the Football game           

D3: Robert went to the Movie Delicatessen 


Here, you see the cut of each word from those sentences


Note: blank box is zero (0)

Cosine Similarity between D1 and D3 document 


Using Cosine Formula,

Cos Θ = 0.18257

Cosine Distance = 0.81743 

And Similarity between D2 and D3 document

Cos Θ = 0.3651

Cosine Distance = 0.6349

We can clearly see that when distance is less the similarity is more and distance is more, two points are dissimilar. Document D2 and D3 is more similar than, D1 and D3
 

Recommender systems are becoming essential in many industries and, hence, have received more attention. In this article, we see that some basics of mathematics can create huge things. Ultimately, it servers as both a tool to improve the client experience and the efficiency of advisors.

Reference:

1. https://towardsdatascience.com/cosine-similarity-for-movie-recommendation-system-e1852018cf76

2. https://en.wikipedia.org/wiki/Cosine_similarity




Comments