Distance

Euclidean Distance

If point x=(x1,x2,,xn)x=(x_1,x_2,…,x_n) and y=(y1,y2,yn)y=(y_1,y_2,…y_n)

d(x,y)=i=1n(xiyi)2d(x,y)=\sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}

Manhattan Distance

The name came from if we want to drive from one building to another in Mahattan, we need to follow the road (x,y axis) rather than go for a straight line between these two points.

It is the projection of Euclidean distance to the x,y axis. Where a(x11,x12,,x1n)a(x_{11},x_{12},…,x_{1n}) and b(y21,y22,,y2n)b(y_{21},y_{22},…,y_{2n}) has

dab=i=1nx1iy2id_{ab}=\sum_{i=1}^{n}|x_{1i}-y_{2i}|

Mahalanobis Distance

It is a very useful way of determining the “similarity” of a set of values from an “unknown”
sample to a collection of “known” samples. The Mahalanobis distance between x and μ\mu is

Dx=(xμ)TS1(xμ)D_{x}=\sqrt{(x-\mu)^TS^{-1}(x-\mu)}

Comparing the the Euclidean distance. “Euclidean distance only makes sense when all the dimensions have the same units (like meters), since it involves adding the squared value of them. When you are dealing with probabilities, a lot of times the features have different units. For example: I have a model for men and women, based on their weight [Kg] and height [m]. I know the mean and covariance for each. Now I get a new measurement set of weight and height and I try to decide if it’s a man or a woman. I can use the Mahalanobis distance from the models of both men and women to decide which is closer, meaning which is more probable.”

Chebyshev Distance

Defined as

D(p,q)=maxi(pi,qi)D(p,q) = \max_{i}(|p_i,q_i|)

Where pp and qq is the coordinate of two points and it can also be wrote as

limk(i=1npiqik)1/k\lim_{k\to \infty}(\sum_{i=1}^{n}|p_i-q_i|^k)^{1/k}

it is called LL-\infty distance

Minkowski Distance

Distance between group, defined as

d12=k=1nx1kx2kppd_{12}=\sqrt[p]{\sum_{k=1}^{n}|x_{1k}-x_{2k}|^p}

Where

p=1p=1 ,it is Manhattan Distance

p=2p=2, it is Mahalanobis Distance

pp \to \infty, it is Chebyshev Distance

Standardized Euclidean Distance

Bhattacharyya Distance

Cosine

For 2 data in n-dimension, cosine is defined as

cos(θ)=ababcos(\theta)=\frac{a\cdot b}{|a||b|}

if a(x11,x12,,x1n)a(x_{11},x_{12},…,x_{1n}) and b(x21,x22,..,x2n)b(x_{21},x_{22},..,x_{2n})

cos(θ)=k=1nx1kx2kk=1nx1k2k=1nx2k2cos(\theta)=\frac{\sum_{k=1}^{n}x_{1k}x_{2k}}{\sqrt{\sum_{k=1}^{n}x_{1k}^2}\sqrt{\sum_{k=1}^{n}x_{2k}^2}}

Jaccard Similarity Coefficient

Defined as

J(A,B)=ABABJ(A,B)=\frac{|A\cap B|}{|A\cup B|}

Pearson Correlation Coefficient

ρxy=cov(x,y)σxσy=(xμx)(yμy)σxσy\rho_{xy} = \frac{cov(x,y)}{\sigma_x \sigma_y} = \frac{(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y}

Author: shixuan liu
Link: http://tedlsx.github.io/2019/08/30/distance/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • Wechat
  • Alipay

Comment